I am learning OS development in a Linux environment using GCC. I learnt in Bran's Kernel Development that all the functions and variable names in C when compiled precedes with an "_"(underscore) in its corresponding Assembly source file. But when I went through the assembly source of a compiled C program, I can't even find the "_main" function. I performed the following.
cpp sample.c sample.i
gcc -S sample.I
That was true in the early days. A given C function foo
would show up as _foo
in the assembler. This was done to avoid conflicts with hand generated .s
files.
It would also be limited to 8 characters total [a linker restriction].
This hasn't been true for decades. Now, symbols are no longer prefixed with _
and can be much longer than 8 characters.
UPDATE:
So, Nowadays GCC does not produce a _ in front of functions and variables?
For the most part, no. IMO, the reference you're citing, on this point, does seem to be a bit dated.
Most POSIX systems (e.g. linux, *BSD) use gcc
[or clang
] and they leave off the _
.
When I first started programming in C [circa 1981], the _
was still being used. This was on AT&T Unix v7, System III, and System V.
IIRC, it was gone by the early 1990s for newer systems (like linux). Personally, I haven't encountered the _
prefix since then, but I've [mostly] used linux [and sometimes cygwin].
Some AT&T Unix derived systems may have kept it around for backward compatibility, but, eventually, most everybody standardized on "foo is foo". I don't have access to OSX, so I can't rule out Johnathan's comment regarding that.
The _
had been around since the early days of Unix (circa 1970). This was before my time, but, IIRC, Unix was originally written in assembler. It was converted to C. The _
was to demarcate functions either written in C, or asm ones that could be called from C functions.
Those that didn't have the prefix were "asm only" [as they may have used non-standard calling conventions]. Back in the day, everything was precious: RAM, CPU cycles, etc.
So, asm functions could/would use "tricks" to conserve resources. Several asm functions could work as a group because they knew about one another.
If a given asm function could be called from C, the _
prefixed symbol was the C compatible "wrapper" for it [that did extra save/restore in the prolog/epilog].
So, I can just call the main function of a C program as "call main" instead of "call _main"?
That's a reasonably safe bet.
If you're calling a given function from C, it will automatically do the right thing (i.e. add prefix or not).
It's only when trying to call a C function from hand generated assembler that the issue might even come up.
So, for asm, I'd just do the simple thing and do call main
. It will work on most [if not all] systems.
If you wanted to "bullet proof" your code, you could run your asm through the C preprocessor (via a .S
file) and do (e.g.):
#ifdef C_USES_UNDERSCORE
#define CF(_x) _##_x
#else
#define CF(_x) _x
#endif
call CF(main)
But, I think that's overkill.
It also illustrates the whole problem with the _
prefix thing. On a modern system [with lots of memory and CPU cycles], why should an assembler function have to know whether an ABI compatible function it is calling was generated from C or hand written assembler?
As detailed by Craig, it's a convention that modern formats/ABIs like COFF and ELF don't follow anymore.
On some targets, that use different ABIs, it's still in use. Examples are NeXT/OS X's Mach-O or 16- and 32-bit Windows. 64-bit Windows doesn't use the underscore anymore (although GCC continued doing so for a time, till 4.5.1 specifically).
Additionally, the underscore might appear as part of a bigger prefix. For example __imp_
in __declspec(dllimport)
symbols or _Z
in the Itanium ABI.
If you for some reason, need to influence the mangling, GCC provides a -f[no]leading-underscore
flag. This will break ABI-compatiblity.
Some links:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With