Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens to identifiers in a program?

I'm a novice programmer. I just wanted to see output at different phases compilation, assembling & linking. I don't know assembly language also.

I wrote a simple program

#include <stdio.h>

int humans = 9;

 int main() 
 {
        int lions = 2;
        int cubs = populate(lions);
        return 0;
 }

 int populate(int crappyVariable)
 {
    return ++crappyVariable;
}

I used gcc - S sample.c I'm surprised by the output of assembly language. I lost all the variable names & function names.

it preserved the global identifiers like humans, populate, main but it prefixed them with underscores _. So, I wont considering it as using identifiers. Anyway, point is it lost all the identifiers.

My question is how would it call functions or refer to variables?

I'm really curious about further stages of output, which would be in binary (which is not viewable).

How would be the output just after assembling & before linking? I guess it will loose even the underscore prefixed global identifiers too? Then again question is how would it call functions or refer to variables for operations?

I searched info on internet but couldn't find anything useful. May be I'm not sure what to search for. I don't want to read big books on this. But if there are any articles, tutorials which clear concepts. That would also be helpful.

I'm a novice programmer. So, it would be great you can explain in simple but technical terms.

EDIT: In response, to the comment. I broke my question into multiple questions. Here is the 2nd part of this question: not clear with the job of the linker

like image 684
Alice Avatar asked Dec 31 '09 18:12

Alice


People also ask

What is the role of identifier in a program?

"Identifiers" or "symbols" are the names you supply for variables, types, functions, and labels in your program. Identifier names must differ in spelling and case from any keywords. You cannot use keywords (either C or Microsoft) as identifiers; they are reserved for special use.

Where are identifiers stored?

The identifiers names are not retained in the assembly file after the compilation. They are maintained during compilation in a data structure called Symbol Table which associates each identifier with the information about its declaration and memory address allocation.

Can an identifier end with a number?

Rules for Declaring an Identifier Rule 1: It can be a combination of letters, digits or underscore, no special characters such as #,$,! @ are allowed in identifiers name. Rule 2: The first character can be either letter or underscores(_).


2 Answers

At the basic machine level, there are no more names, just numeric addresses for variables and code. Thus, once your code is translated to machine language, the names are gone for practical purposes.

If you compile with a "to assembler" option or disassemble code, you may see some identifiers; they're there to help you find your way around the code, as you're not expected to be computing data/code offsets in your head unnecessarily.

To answer your question about linking and such: Labels and identifiers that are only used "inside" a C program file are gone once the program is compiled to relocatable object form. However, externally defined names, such as main() are needed because external modules will reference them; so a compiled object file will contain a little table listing the externally visible names and which location they refer to. A linker can then patch together external references into your module from others (and vice versa) based on those names.

After linking, even the externally defined names aren't needed any more. If you compile with debug options, tables of names may still be attached to the final program, though, so you can use those names when debugging your program.

like image 63
Carl Smotricz Avatar answered Oct 11 '22 02:10

Carl Smotricz


You really need to read up on compilers and compiler design. Start with http://www.freetechbooks.com/compiler-design-and-construction-f14.html

Here's the summary.

The goal is to get stuff copied into memory that will execute and run. Then the OS hands control over to that stuff.

The "loader" copies stuff into memory from various files. These files are actually a kind of language describing where stuff goes in memory and what goes in those places. It's a kind of "load memory" language.

The job of compiler and linker is to create files that will make the loader do the right thing.

The compiler's output is "object" files -- essentially loader instructions in many small fragmented files with many external references. The compiler's output is ideally some machine code with place-holders for external references to be plugged in. All the internal references have been resolved as offsets into heap memory or stack frames or function names.

The linker's output is larger loader files with fewer external references. It's largely the same as the compiler's output in format. But it has more stuff folded in.

Read this on the ld command: http://linux.about.com/library/cmd/blcmdl1_ld.htm

Read this on the nm command: http://linux.about.com/library/cmd/blcmdl1_nm.htm

Here's some details.

"...how would it call functions or refer to variables?"

The function names, generally, are preserved until the later stages of producing output.

The variable names are transformed into something else. "Global" variables are allocated statically and the compiler has a map from variable name to type to offset into the static ("heap") memory.

Local variables within a function are (usually) allocated in the stack frame. The compiler has a map from variable name to type to offset into the stack frame. When the function is entered, a stack frame of the required size is allocated and the variables are simply offsets into that frame.

"...how would it call functions or refer to variables for operations?"

You have to provide a hint to the compiler. The extern keyword tells the compiler that a name is not defined in this module, but is defined in another module and the reference must be resolved at link (or load) time.

"...if there is nothing to link..."

This is never true. Your program is only one piece of the overall executable. Most C libraries include the real main program which then calls your function named "main".

"will the linker change the object code output of assembler?"

This varies a lot with OS. In many OS's the linker and the loading all happen at once. What often happens is that the output from the C compiler is thrown into an archive without having really had much resolution performed.

When the executable is loaded into memory, the archive references and any external shared object files are loaded, also.

"The program is not running, its just in the manufacturing stage."

This doesn't mean anything. Not sure why you're including this.

"How could linker map to memory? How would it look like?"

The OS will allocate a block of memory into which the executable program must be copied. The linker/loader reads the object file, any object archive files, and copies the stuff in those files into that memory. The linker does the copying and name resolution and writes a new object file that's more compiler. The loader does it into real memory and turns over execution to the resulting text page.

"Its at the run time right?"

That's the only way to debug -- run time. It can't mean anything else, or it's not debugging.

like image 33
S.Lott Avatar answered Oct 11 '22 03:10

S.Lott