Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

meaning of an entry in a relocation table of an object file

I met some problems in understanding the entries of relocation tables compiled from C source files. My programs are as below:

//a.c
extern int shared;
int main(){
    int a = 100;
    swap(&a, &shared);
    a = 200;
    shared = 1;
    swap(&a, &shared);
}
//b.c
int shared = 1;
void swap(int* a, int* b) {
    if (a != b)
        *b ^= *a ^= *b, *a ^= *b;
}

I compile and link them with the following commands gcc -c -fno-stack-protector a.c b.c and ld a.o b.o -e main -o ab. Then I objdump -r a.o to check its relocation table.

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000014 R_X86_64_32       shared
0000000000000021 R_X86_64_PC32     swap-0x0000000000000004
000000000000002e R_X86_64_PC32     shared-0x0000000000000008
000000000000003b R_X86_64_32       shared
0000000000000048 R_X86_64_PC32     swap-0x0000000000000004

The disassembly of a.o is

Disassembly of section .text:

0000000000000000 <main>:
0:  55                      push   %rbp
1:  48 89 e5                mov    %rsp,%rbp
4:  48 83 ec 10             sub    $0x10,%rsp
8:  c7 45 fc 64 00 00 00    movl   $0x64,-0x4(%rbp)
f:  48 8d 45 fc             lea    -0x4(%rbp),%rax
13: be 00 00 00 00          mov    $0x0,%esi
18: 48 89 c7                mov    %rax,%rdi
1b: b8 00 00 00 00          mov    $0x0,%eax
20: e8 00 00 00 00          callq  25 <main+0x25>
25: c7 45 fc c8 00 00 00    movl   $0xc8,-0x4(%rbp)
2c: c7 05 00 00 00 00 01    movl   $0x1,0x0(%rip)  # 36 <main+0x36>
33: 00 00 00 
36: 48 8d 45 fc             lea    -0x4(%rbp),%rax
3a: be 00 00 00 00          mov    $0x0,%esi
3f: 48 89 c7                mov    %rax,%rdi
42: b8 00 00 00 00          mov    $0x0,%eax
47: e8 00 00 00 00          callq  4c <main+0x4c>
4c: b8 00 00 00 00          mov    $0x0,%eax
51: c9                      leaveq 
52: c3                      retq  

My question is that: shared at 14 and shared at 2e are totally the same objects. Why they have different symbol names?

like image 226
BecomeBetter Avatar asked Sep 07 '18 04:09

BecomeBetter


People also ask

What does relocation table contain?

The relocation table contains information about instructions that need to be updated if the addresses in the file change, for example if the file is linked together with another.

What is a relocation record?

Relocation records : information about addresses referenced in this object file that the linker must adjust once it knows the final memory allocation. Additional information for debugging (e.g. map from line numbers in the source file to location in the code section).

What are relocations in an object file?

The relocation section itself describes how to modify another section in the file. Relocation offsets designate a storage unit within the second section. For an executable or shared object, the value indicates the virtual address of the storage unit affected by the relocation.

What do you mean by relocation in system programming?

Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a program calls a function, the associated call instruction must transfer control to the proper destination address at execution.

What is a relocation table in C?

The relocation table is a list of pointers created by the translator (a compiler or assembler) and stored in the object or executable file.

What are relocatable files?

Relocatable files must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process's program image. Relocation entries are these data. Relocation entries can have the following structure, defined in sys/elf.h:

What are relocation entries in a link editor?

Relocation entries describe how to alter the following instruction and data fields. The link editor merges one or more relocatable files to form the output. It first decides how to combine and locate the input files, then updates the symbol values, and finally performs the relocation.

What does the symbol table index of a relocation member mean?

This member gives both the symbol table index, with respect to which the relocation must be made, and the type of relocation to apply. For example, a call instruction's relocation entry holds the symbol table index of the function being called. If the index is STN_UNDEF, the undefined symbol index, the relocation uses 0 as the symbol value.


1 Answers

That is the same address but the relocation types are different. The relocation types are defined in x86-64-abi.

What is the difference?

At 0x14 and 0x3b: the address of the global variable shared must be moved to register %rsi in order to call the function swap.

However, because the program was compiled with -mcmodel=small (default for gcc, see also this question), the compiler can assume, that the address fits into 32bit and uses movl instead of movq (actually the compiler would use other instructions otherwise, but comparing movl with "naive" movq explains the difference pretty well), which would need more bytes to be encoded.

Thus, the resulting relocation is R_X86_64_32 (i.e. 64bit address truncated to 32bit without sign-extension) and not R_X86_64_64, i.e. the linker will write 4 lower bytes of the address instead of the placeholder, which is also 4 bytes wide.

At 0x2e you would like to write the value 1 to memory address shared. However, the target-address is given relative to %rip, i.e. relative to0x36:

movl   $0x1,0x0(%rip)  # 36 <main+0x36>

Obviously, just putting the absolute address of shared via R_X86_64_32 won't do any good - a more complicated calculation is needed and this is what R_X86_64_PC32 is for.

Once again, because of the small code model the compiler can assume, that 32-bit rip-relative offset is enough (and thus the relocation R_X86_64_PC32 and not R_X86_64_PC64 is used) and the placeholder is only 4 bytes wide.

Taken from the x86-64-abi, the formula for the relocation is (section 4.4):

result = S+A-P (32bit-word, i.e. the lower 4 bytes of the result) 
S = the value of the symbol whose index resides in the relocation entry 
A = the addend used to compute the value of the relocatable field 
P = the place (section offset or address) of the storage unit being relocated (computed using r_offset)

That means:

  • S is the address of the shared variable.
  • A is -8 (can be seen for example by calling readelf -r a.o or objdump -r a.o), because there is difference of 8 byte between the offset of the relocation 0x2e and the actual %rip - 0x36.
  • P is the offset of the relocation, i.e. 0x26. P-A is the address in %rip.

As you can see, the result is not S as in the case of R_X86_64_32 above, but S - (P-A). It also can be seen in the resulting binary - different values will be patched at the placeholders for these two different relocation types.


There is a great article about this topic from Eli Bendersky.

like image 188
ead Avatar answered Oct 14 '22 20:10

ead