What happens to identifiers in a program?

Tags:

I'm a novice programmer. I just wanted to see output at different phases compilation, assembling & linking. I don't know assembly language also.

I wrote a simple program

#include <stdio.h>

int humans = 9;

 int main() 
 {
        int lions = 2;
        int cubs = populate(lions);
        return 0;
 }

 int populate(int crappyVariable)
 {
    return ++crappyVariable;
}

I used gcc - S sample.c I'm surprised by the output of assembly language. I lost all the variable names & function names.

it preserved the global identifiers like humans, populate, main but it prefixed them with underscores _. So, I wont considering it as using identifiers. Anyway, point is it lost all the identifiers.

My question is how would it call functions or refer to variables?

I'm really curious about further stages of output, which would be in binary (which is not viewable).

How would be the output just after assembling & before linking? I guess it will loose even the underscore prefixed global identifiers too? Then again question is how would it call functions or refer to variables for operations?

I searched info on internet but couldn't find anything useful. May be I'm not sure what to search for. I don't want to read big books on this. But if there are any articles, tutorials which clear concepts. That would also be helpful.

I'm a novice programmer. So, it would be great you can explain in simple but technical terms.

EDIT: In response, to the comment. I broke my question into multiple questions. Here is the 2nd part of this question: not clear with the job of the linker

684

asked Dec 31 '09 18:12

Alice

2 Answers

At the basic machine level, there are no more names, just numeric addresses for variables and code. Thus, once your code is translated to machine language, the names are gone for practical purposes.

If you compile with a "to assembler" option or disassemble code, you may see some identifiers; they're there to help you find your way around the code, as you're not expected to be computing data/code offsets in your head unnecessarily.

To answer your question about linking and such: Labels and identifiers that are only used "inside" a C program file are gone once the program is compiled to relocatable object form. However, externally defined names, such as main() are needed because external modules will reference them; so a compiled object file will contain a little table listing the externally visible names and which location they refer to. A linker can then patch together external references into your module from others (and vice versa) based on those names.

After linking, even the externally defined names aren't needed any more. If you compile with debug options, tables of names may still be attached to the final program, though, so you can use those names when debugging your program.

answered Oct 11 '22 02:10

Carl Smotricz

You really need to read up on compilers and compiler design. Start with http://www.freetechbooks.com/compiler-design-and-construction-f14.html

Here's the summary.

The goal is to get stuff copied into memory that will execute and run. Then the OS hands control over to that stuff.

The "loader" copies stuff into memory from various files. These files are actually a kind of language describing where stuff goes in memory and what goes in those places. It's a kind of "load memory" language.

The job of compiler and linker is to create files that will make the loader do the right thing.

The compiler's output is "object" files -- essentially loader instructions in many small fragmented files with many external references. The compiler's output is ideally some machine code with place-holders for external references to be plugged in. All the internal references have been resolved as offsets into heap memory or stack frames or function names.

The linker's output is larger loader files with fewer external references. It's largely the same as the compiler's output in format. But it has more stuff folded in.

Read this on the ld command: http://linux.about.com/library/cmd/blcmdl1_ld.htm

Read this on the nm command: http://linux.about.com/library/cmd/blcmdl1_nm.htm

Here's some details.

"...how would it call functions or refer to variables?"

The function names, generally, are preserved until the later stages of producing output.

The variable names are transformed into something else. "Global" variables are allocated statically and the compiler has a map from variable name to type to offset into the static ("heap") memory.

Local variables within a function are (usually) allocated in the stack frame. The compiler has a map from variable name to type to offset into the stack frame. When the function is entered, a stack frame of the required size is allocated and the variables are simply offsets into that frame.

"...how would it call functions or refer to variables for operations?"

You have to provide a hint to the compiler. The extern keyword tells the compiler that a name is not defined in this module, but is defined in another module and the reference must be resolved at link (or load) time.

"...if there is nothing to link..."

This is never true. Your program is only one piece of the overall executable. Most C libraries include the real main program which then calls your function named "main".

"will the linker change the object code output of assembler?"

This varies a lot with OS. In many OS's the linker and the loading all happen at once. What often happens is that the output from the C compiler is thrown into an archive without having really had much resolution performed.

When the executable is loaded into memory, the archive references and any external shared object files are loaded, also.

"The program is not running, its just in the manufacturing stage."

This doesn't mean anything. Not sure why you're including this.

"How could linker map to memory? How would it look like?"

The OS will allocate a block of memory into which the executable program must be copied. The linker/loader reads the object file, any object archive files, and copies the stuff in those files into that memory. The linker does the copying and name resolution and writes a new object file that's more compiler. The loader does it into real memory and turns over execution to the resulting text page.

"Its at the run time right?"

That's the only way to debug -- run time. It can't mean anything else, or it's not debugging.

answered Oct 11 '22 03:10

S.Lott

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What happens to identifiers in a program?

Tags:

assembly

compiler-construction

linker

Alice

People also ask

2 Answers

Carl Smotricz

S.Lott

Recent Activity

Donate For Us

What happens to identifiers in a program?

Tags:

assembly

compiler-construction

linker

Alice

People also ask

2 Answers

Carl Smotricz

S.Lott

Related questions

Recent Activity

Donate For Us