The Linker Is not a Magical Program

From WikiContent

(Difference between revisions)
Jump to: navigation, search
Line 8: Line 8:
Step 3 is, of course, the linking step. Why would I say such an outrageous thing? I've been doing tech support for decades, and I get the following questions again and again:
Step 3 is, of course, the linking step. Why would I say such an outrageous thing? I've been doing tech support for decades, and I get the following questions again and again:
-
# The linker says <code>abc</code> is an unresolved symbol.
 
# The linker says <code>def</code> is defined more than once.
# The linker says <code>def</code> is defined more than once.
 +
# The linker says <code>abc</code> is an unresolved symbol.
# Why is my executable so large?
# Why is my executable so large?
Line 15: Line 15:
A linker is a very stupid, pedestrian, straightforward program. All it does is concatenate together the code and data sections of the object files, connect the the references to symbols with their definitions, pull unresolved symbols out of the library, and write out an executable. That's it. No spells! No magic! The tedium in writing a linker is usually all about decoding and generating the usually ridiculously over-complicated file formats, but that doesn't change the essential nature of a linker.
A linker is a very stupid, pedestrian, straightforward program. All it does is concatenate together the code and data sections of the object files, connect the the references to symbols with their definitions, pull unresolved symbols out of the library, and write out an executable. That's it. No spells! No magic! The tedium in writing a linker is usually all about decoding and generating the usually ridiculously over-complicated file formats, but that doesn't change the essential nature of a linker.
 +
 +
So let's say the linker is saying <code>def is defined more than once</code>. If it's a good
 +
linker, it'll tell you which object file(s) the symbol appears in, and so look at them and
 +
remove the duplicate. If it doesn't tell you where the symbols are coming from, the easy way
 +
to figure it out is use grep:
 +
 +
<code>grep def *.obj</code>
 +
 +
If, once those files are identified, it isn't clear from the corresponding source file where
 +
those symbols come from (can happen with templates), try examining the object files with a dumper
 +
or disassembler.
 +
 +
If the linker is saying <code>abc is an unresolved symbol</code>, it probably says which object file it was referenced from. If it doesn't, once again grep rides to the rescue:
 +
 +
<code>grep abc *.obj</code>
 +
 +
which will tell you which object file, and hence which source file, it comes from. Then it's up to you to figure out who is supposed to define it, and make sure that definition is supplied to the linker.
 +
 +
Often, though, these undefined symbols are defined in some system library. Given the long list of system libraries, it isn't immediately obvious which needs to be linked in to resolve the symbol. Needless to say, once again grep is the magic solution:
 +
 +
<code>grep abc \compiler\lib\*.lib</code>
 +
 +
or where ever the system libraries are stored. The nice thing about using grep for this is it doesn't matter what format the object files or libraries are in. I've never met one yet that didn't store symbol names as plaintext strings.
 +
 +
To determine why an executable is the size it is, take a look at the map file that linkers
 +
optionally generate. A map file is nothing more than a list of all the symbols in the executable
 +
along with their addresses. This tells you what modules were linked in from the library, and the
 +
sizes of each module. Now you can see where the bloat is coming from. Often there will be library modules that you have no idea why were linked in. To figure it out, temporarily remove the suspicious module from the library, and relink. The undefined symbol error then generated will indicate who is referencing that module.
By [[Walter Bright]]
By [[Walter Bright]]

Revision as of 00:31, 4 February 2009

Depressingly often (happened to me again just before I wrote this), the view many programmers have of the process of going from source code to a statically linked executable in a compiled language is:

  1. Edit source code
  2. Compile source code into object files
  3. Something magical happens
  4. Run executable

Step 3 is, of course, the linking step. Why would I say such an outrageous thing? I've been doing tech support for decades, and I get the following questions again and again:

  1. The linker says def is defined more than once.
  2. The linker says abc is an unresolved symbol.
  3. Why is my executable so large?

Followed by "What do I do now?" usually with the phrases "seems to" and "somehow" mixed in, and an aura of utter bafflement. It's the "seems to" and "somehow" that indicate that the linking process is viewed as a magical process, presumably understandable only by wizards and warlocks. The process of compiling does not elicit these kinds of phrases, implying that programmers generally understand how compilers work, or at least what they do.

A linker is a very stupid, pedestrian, straightforward program. All it does is concatenate together the code and data sections of the object files, connect the the references to symbols with their definitions, pull unresolved symbols out of the library, and write out an executable. That's it. No spells! No magic! The tedium in writing a linker is usually all about decoding and generating the usually ridiculously over-complicated file formats, but that doesn't change the essential nature of a linker.

So let's say the linker is saying def is defined more than once. If it's a good linker, it'll tell you which object file(s) the symbol appears in, and so look at them and remove the duplicate. If it doesn't tell you where the symbols are coming from, the easy way to figure it out is use grep:

grep def *.obj

If, once those files are identified, it isn't clear from the corresponding source file where those symbols come from (can happen with templates), try examining the object files with a dumper or disassembler.

If the linker is saying abc is an unresolved symbol, it probably says which object file it was referenced from. If it doesn't, once again grep rides to the rescue:

grep abc *.obj

which will tell you which object file, and hence which source file, it comes from. Then it's up to you to figure out who is supposed to define it, and make sure that definition is supplied to the linker.

Often, though, these undefined symbols are defined in some system library. Given the long list of system libraries, it isn't immediately obvious which needs to be linked in to resolve the symbol. Needless to say, once again grep is the magic solution:

grep abc \compiler\lib\*.lib

or where ever the system libraries are stored. The nice thing about using grep for this is it doesn't matter what format the object files or libraries are in. I've never met one yet that didn't store symbol names as plaintext strings.

To determine why an executable is the size it is, take a look at the map file that linkers optionally generate. A map file is nothing more than a list of all the symbols in the executable along with their addresses. This tells you what modules were linked in from the library, and the sizes of each module. Now you can see where the bloat is coming from. Often there will be library modules that you have no idea why were linked in. To figure it out, temporarily remove the suspicious module from the library, and relink. The undefined symbol error then generated will indicate who is referencing that module.

By Walter Bright

This work is licensed under a Creative Commons Attribution 3


Back to 97 Things Every Programmer Should Know home page

Personal tools