Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ELF dynamic symbol table

Tags:

gcc

elf

I have a question about ELF dynamic symbol table. For symbols of type FUNC, I have noticed a value of 0 in some binaries. But in other binaries, it has some non-zero value. Both these binaries were generated by gcc, I want to know why is this difference?. Is there any compiler options to control this?

EDIT: This is the output of readelf --dyn-syms prog1

Symbol table '.dynsym' contains 5 entries:
Num:    Value  Size Type    Bind   Vis      Ndx Name
 0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
 1: 00000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
 2: 000082f0     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.4 (2)
 3: 00008314     0 FUNC    GLOBAL DEFAULT  UND abort@GLIBC_2.4 (2)
 4: 000082fc     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.4 

Here value of "printf" symbol is 82f0 which happens to be the address of plt table entry for printf.

Output of readelf --dyn-syms prog2

Symbol table '.dynsym' contains 6 entries:
Num:    Value  Size Type    Bind   Vis      Ndx Name
 0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
 1: 00000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
 2: 00000000     0 FUNC    GLOBAL DEFAULT  UND puts@GLIBC_2.4 (2)
 3: 00000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.4 (2)
 4: 00000000     0 FUNC    GLOBAL DEFAULT  UND abort@GLIBC_2.4 (2)
 5: 00000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.4 

Here the values for all the symbols are zero.

like image 767
Vasant K Avatar asked Sep 23 '15 10:09

Vasant K


2 Answers

The x86_64 SV ABI mandates that (emphasis mine):

To allow comparisons of function addresses to work as expected, if an executable file references a function defined in a shared object, the link editor will place the address of the procedure linkage table entry for that function in its associated symbol table entry. This will result in symbol table entries with section index of SHN_UNDEF but a type of STT_FUNC and a non-zero st_value. A reference to the address of a function from within a shared library will be satisfied by such a definition in the executable.

With my GCC, this program:

#include <stdio.h>

int main()
{
  printf("hello %i\n", 42);
  return 0;
}

when compiled directly into an executable generates a null value:

 1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (2)

But this program with a comparison of the printf function:

#include <stdio.h>

int main()
{
  printf("hello %i\n", 42);
  if (printf == puts)
    return 1;
  return 0;
}

generates a non-null value:

 3: 0000000000400410     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (2)

In the .o file, the first program generates:

000000000014  000a00000002 R_X86_64_PC32     0000000000000000 printf - 4

and the second:

000000000014  000a00000002 R_X86_64_PC32     0000000000000000 printf - 4
000000000019  000a0000000a R_X86_64_32       0000000000000000 printf + 0

The difference is caused by the extra R_X86_64_32 relocation for getting the address of the function.

like image 181
ysdx Avatar answered Sep 17 '22 19:09

ysdx


Observations by running readelf on some binary

All the FUNCTIONS which are UNDEFINED have size zero.

These undefined functions are those which are called through libraries. In my small ELF binary all references to GLIBc are undefined with size zero

From http://docs.oracle.com/cd/E19457-01/801-6737/801-6737.pdf on page 21

It becomes clear that symbol table can have three types of symbols. Among these three, two types UNDEFINED and TENTATIVE symbols are those which are with out storage assigned. in later case you can see in readelf output, some functions which are not undefined(have index) and does not have storage.

for clarity undefined symbols are those which are referenced but does not assign storage(have not been created yet) while tentative symbols are those which are created but w/o assigned storage. e.g uninitialized symbols

edit

if you are talking about .plt, shared libraries symbols bind is lazy.

how to control the bind see http://www.linuxjournal.com/article/1060

This feature is known as lazy symbol binding. The idea is that if you have lots of shared libraries, it could take the dynamic loader lots of time to look up all of the functions to initialize all of the .plt slots, so it would be preferable to defer binding addresses to the functions until we actually need them. This turns out to be a big win if you only end up using a small fraction of the functions in a shared library. It is possible to instruct the dynamic loader to bind addresses to all of the .plt slots before transferring control to the application—this is done by setting the environment variable LD_BIND_NOW=1 before running the program. This turns out to be useful in some cases when you are debugging a program, for example. Also, I should point out that the .plt is in read-only memory. Thus the addresses used for the target of the jump are actually stored in the .got section. The .got also contains a set of pointers for all of the global variables that are used within a program that come from a shared library.

like image 30
incompetent Avatar answered Sep 17 '22 19:09

incompetent