I have a question about ELF dynamic symbol table. For symbols of type FUNC, I have noticed a value of 0 in some binaries. But in other binaries, it has some non-zero value. Both these binaries were generated by gcc, I want to know why is this difference?. Is there any compiler options to control this?
EDIT: This is the output of readelf --dyn-syms prog1
Symbol table '.dynsym' contains 5 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
2: 000082f0 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.4 (2)
3: 00008314 0 FUNC GLOBAL DEFAULT UND abort@GLIBC_2.4 (2)
4: 000082fc 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.4
Here value of "printf" symbol is 82f0 which happens to be the address of plt table entry for printf.
Output of readelf --dyn-syms prog2
Symbol table '.dynsym' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
2: 00000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.4 (2)
3: 00000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.4 (2)
4: 00000000 0 FUNC GLOBAL DEFAULT UND abort@GLIBC_2.4 (2)
5: 00000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.4
Here the values for all the symbols are zero.
The x86_64 SV ABI mandates that (emphasis mine):
To allow comparisons of function addresses to work as expected, if an executable file references a function defined in a shared object, the link editor will place the address of the procedure linkage table entry for that function in its associated symbol table entry. This will result in symbol table entries with section index of SHN_UNDEF but a type of STT_FUNC and a non-zero st_value. A reference to the address of a function from within a shared library will be satisfied by such a definition in the executable.
With my GCC, this program:
#include <stdio.h>
int main()
{
printf("hello %i\n", 42);
return 0;
}
when compiled directly into an executable generates a null value:
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.2.5 (2)
But this program with a comparison of the printf
function:
#include <stdio.h>
int main()
{
printf("hello %i\n", 42);
if (printf == puts)
return 1;
return 0;
}
generates a non-null value:
3: 0000000000400410 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.2.5 (2)
In the .o file, the first program generates:
000000000014 000a00000002 R_X86_64_PC32 0000000000000000 printf - 4
and the second:
000000000014 000a00000002 R_X86_64_PC32 0000000000000000 printf - 4
000000000019 000a0000000a R_X86_64_32 0000000000000000 printf + 0
The difference is caused by the extra R_X86_64_32
relocation for getting the address of the function.
Observations by running readelf
on some binary
All the FUNCTIONS which are UNDEFINED have size zero.
These undefined functions are those which are called through libraries. In my small ELF binary all references to GLIBc are undefined with size zero
From http://docs.oracle.com/cd/E19457-01/801-6737/801-6737.pdf on page 21
It becomes clear that symbol table can have three types of symbols. Among these three, two types UNDEFINED and TENTATIVE symbols are those which are with out storage assigned. in later case you can see in readelf output, some functions which are not undefined(have index) and does not have storage.
for clarity undefined symbols are those which are referenced but does not assign storage(have not been created yet) while tentative symbols are those which are created but w/o assigned storage. e.g uninitialized symbols
edit
if you are talking about .plt, shared libraries symbols bind is lazy.
how to control the bind see http://www.linuxjournal.com/article/1060
This feature is known as lazy symbol binding. The idea is that if you have lots of shared libraries, it could take the dynamic loader lots of time to look up all of the functions to initialize all of the .plt slots, so it would be preferable to defer binding addresses to the functions until we actually need them. This turns out to be a big win if you only end up using a small fraction of the functions in a shared library. It is possible to instruct the dynamic loader to bind addresses to all of the .plt slots before transferring control to the application—this is done by setting the environment variable LD_BIND_NOW=1 before running the program. This turns out to be useful in some cases when you are debugging a program, for example. Also, I should point out that the .plt is in read-only memory. Thus the addresses used for the target of the jump are actually stored in the .got section. The .got also contains a set of pointers for all of the global variables that are used within a program that come from a shared library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With