I met some problems in understanding the entries of relocation tables compiled from C source files. My programs are as below: <pre class="prettyprint"><code>//a.c extern int shared; int main(){ int a = 100; swap(&a, &shared); a = 200; shared = 1; swap(&a, &shared); } //b.c int shared = 1; void swap(int* a, int* b) { if (a != b) *b ^= *a ^= *b, *a ^= *b; } </code></pre> I compile and link them with the following commands <code>gcc -c -fno-stack-protector a.c b.c</code> and <code>ld a.o b.o -e main -o ab</code>. Then I <code>objdump -r a.o</code> to check its relocation table. <pre class="prettyprint"><code>RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 0000000000000014 R_X86_64_32 shared 0000000000000021 R_X86_64_PC32 swap-0x0000000000000004 000000000000002e R_X86_64_PC32 shared-0x0000000000000008 000000000000003b R_X86_64_32 shared 0000000000000048 R_X86_64_PC32 swap-0x0000000000000004 </code></pre> The disassembly of <code>a.o</code> is <pre class="prettyprint"><code>Disassembly of section .text: 0000000000000000 <main>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 10 sub $0x10,%rsp 8: c7 45 fc 64 00 00 00 movl $0x64,-0x4(%rbp) f: 48 8d 45 fc lea -0x4(%rbp),%rax 13: be 00 00 00 00 mov $0x0,%esi 18: 48 89 c7 mov %rax,%rdi 1b: b8 00 00 00 00 mov $0x0,%eax 20: e8 00 00 00 00 callq 25 <main+0x25> 25: c7 45 fc c8 00 00 00 movl $0xc8,-0x4(%rbp) 2c: c7 05 00 00 00 00 01 movl $0x1,0x0(%rip) # 36 <main+0x36> 33: 00 00 00 36: 48 8d 45 fc lea -0x4(%rbp),%rax 3a: be 00 00 00 00 mov $0x0,%esi 3f: 48 89 c7 mov %rax,%rdi 42: b8 00 00 00 00 mov $0x0,%eax 47: e8 00 00 00 00 callq 4c <main+0x4c> 4c: b8 00 00 00 00 mov $0x0,%eax 51: c9 leaveq 52: c3 retq </code></pre> My question is that: <code>shared</code> at 14 and <code>shared</code> at 2e are totally the same objects. Why they have different symbol names?

That is the same address but the relocation types are different. The relocation types are defined in x86-64-abi. What is the difference? At <code>0x14</code> and <code>0x3b</code>: the address of the global variable <code>shared</code> must be moved to register <code>%rsi</code> in order to call the function <code>swap</code>. However, because the program was compiled with <code>-mcmodel=small</code> (default for gcc, see also this question), the compiler can assume, that the address fits into 32bit and uses <code>movl</code> instead of <code>movq</code> (actually the compiler would use other instructions otherwise, but comparing <code>movl</code> with "naive" <code>movq</code> explains the difference pretty well), which would need more bytes to be encoded. Thus, the resulting relocation is <code>R_X86_64_32</code> (i.e. 64bit address truncated to 32bit without sign-extension) and not <code>R_X86_64_64</code>, i.e. the linker will write 4 lower bytes of the address instead of the placeholder, which is also 4 bytes wide. At <code>0x2e</code> you would like to write the value <code>1</code> to memory address <code>shared</code>. However, the target-address is given relative to <code>%rip</code>, i.e. relative to<code>0x36</code>: <pre class="prettyprint"><code>movl $0x1,0x0(%rip) # 36 <main+0x36> </code></pre> Obviously, just putting the absolute address of <code>shared</code> via <code>R_X86_64_32</code> won't do any good - a more complicated calculation is needed and this is what <code>R_X86_64_PC32</code> is for. Once again, because of the small code model the compiler can assume, that 32-bit rip-relative offset is enough (and thus the relocation <code>R_X86_64_PC32</code> and not <code>R_X86_64_PC64</code> is used) and the placeholder is only 4 bytes wide. Taken from the x86-64-abi, the formula for the relocation is (section 4.4): <pre class="prettyprint"><code>result = S+A-P (32bit-word, i.e. the lower 4 bytes of the result) S = the value of the symbol whose index resides in the relocation entry A = the addend used to compute the value of the relocatable field P = the place (section offset or address) of the storage unit being relocated (computed using r_offset) </code></pre> That means: <ul> <li> <code>S</code> is the address of the <code>shared</code> variable. </li> <li> <code>A</code> is <code>-8</code> (can be seen for example by calling <code>readelf -r a.o</code> or <code>objdump -r a.o</code>), because there is difference of 8 byte between the offset of the relocation <code>0x2e</code> and the actual <code>%rip</code> - <code>0x36</code>.</li> <li> <code>P</code> is the offset of the relocation, i.e. <code>0x26</code>. <code>P-A</code> is the address in <code>%rip</code>.</li> </ul> As you can see, the result is not <code>S</code> as in the case of <code>R_X86_64_32</code> above, but <code>S - (P-A)</code>. It also can be seen in the resulting binary - different values will be patched at the placeholders for these two different relocation types. <hr> There is a great article about this topic from Eli Bendersky.

meaning of an entry in a relocation table of an object file

Tags:

c

x86-64

linker

relocation

I met some problems in understanding the entries of relocation tables compiled from C source files. My programs are as below:

//a.c
extern int shared;
int main(){
    int a = 100;
    swap(&a, &shared);
    a = 200;
    shared = 1;
    swap(&a, &shared);
}
//b.c
int shared = 1;
void swap(int* a, int* b) {
    if (a != b)
        *b ^= *a ^= *b, *a ^= *b;
}

I compile and link them with the following commands gcc -c -fno-stack-protector a.c b.c and ld a.o b.o -e main -o ab. Then I objdump -r a.o to check its relocation table.

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000014 R_X86_64_32       shared
0000000000000021 R_X86_64_PC32     swap-0x0000000000000004
000000000000002e R_X86_64_PC32     shared-0x0000000000000008
000000000000003b R_X86_64_32       shared
0000000000000048 R_X86_64_PC32     swap-0x0000000000000004

The disassembly of a.o is

Disassembly of section .text:

0000000000000000 <main>:
0:  55                      push   %rbp
1:  48 89 e5                mov    %rsp,%rbp
4:  48 83 ec 10             sub    $0x10,%rsp
8:  c7 45 fc 64 00 00 00    movl   $0x64,-0x4(%rbp)
f:  48 8d 45 fc             lea    -0x4(%rbp),%rax
13: be 00 00 00 00          mov    $0x0,%esi
18: 48 89 c7                mov    %rax,%rdi
1b: b8 00 00 00 00          mov    $0x0,%eax
20: e8 00 00 00 00          callq  25 <main+0x25>
25: c7 45 fc c8 00 00 00    movl   $0xc8,-0x4(%rbp)
2c: c7 05 00 00 00 00 01    movl   $0x1,0x0(%rip)  # 36 <main+0x36>
33: 00 00 00 
36: 48 8d 45 fc             lea    -0x4(%rbp),%rax
3a: be 00 00 00 00          mov    $0x0,%esi
3f: 48 89 c7                mov    %rax,%rdi
42: b8 00 00 00 00          mov    $0x0,%eax
47: e8 00 00 00 00          callq  4c <main+0x4c>
4c: b8 00 00 00 00          mov    $0x0,%eax
51: c9                      leaveq 
52: c3                      retq

My question is that: shared at 14 and shared at 2e are totally the same objects. Why they have different symbol names?

226

asked Sep 07 '18 04:09

BecomeBetter

1 Answers

That is the same address but the relocation types are different. The relocation types are defined in x86-64-abi.

What is the difference?

At 0x14 and 0x3b: the address of the global variable shared must be moved to register %rsi in order to call the function swap.

However, because the program was compiled with -mcmodel=small (default for gcc, see also this question), the compiler can assume, that the address fits into 32bit and uses movl instead of movq (actually the compiler would use other instructions otherwise, but comparing movl with "naive" movq explains the difference pretty well), which would need more bytes to be encoded.

Thus, the resulting relocation is R_X86_64_32 (i.e. 64bit address truncated to 32bit without sign-extension) and not R_X86_64_64, i.e. the linker will write 4 lower bytes of the address instead of the placeholder, which is also 4 bytes wide.

At 0x2e you would like to write the value 1 to memory address shared. However, the target-address is given relative to %rip, i.e. relative to0x36:

movl   $0x1,0x0(%rip)  # 36 <main+0x36>

Obviously, just putting the absolute address of shared via R_X86_64_32 won't do any good - a more complicated calculation is needed and this is what R_X86_64_PC32 is for.

Once again, because of the small code model the compiler can assume, that 32-bit rip-relative offset is enough (and thus the relocation R_X86_64_PC32 and not R_X86_64_PC64 is used) and the placeholder is only 4 bytes wide.

Taken from the x86-64-abi, the formula for the relocation is (section 4.4):

result = S+A-P (32bit-word, i.e. the lower 4 bytes of the result) 
S = the value of the symbol whose index resides in the relocation entry 
A = the addend used to compute the value of the relocatable field 
P = the place (section offset or address) of the storage unit being relocated (computed using r_offset)

That means:

S is the address of the shared variable.
A is -8 (can be seen for example by calling readelf -r a.o or objdump -r a.o), because there is difference of 8 byte between the offset of the relocation 0x2e and the actual %rip - 0x36.
P is the offset of the relocation, i.e. 0x26. P-A is the address in %rip.

As you can see, the result is not S as in the case of R_X86_64_32 above, but S - (P-A). It also can be seen in the resulting binary - different values will be patched at the placeholders for these two different relocation types.

There is a great article about this topic from Eli Bendersky.

188

answered Oct 14 '22 20:10

ead

Related questions
                            
                                GCC as m68k cross-compiler
                            
                                remove debug strings in release build
                            
                                Real-life example of bug caused by identifier starting with an underscore [closed]
                            
                                Drop parameters in generator macro
                            
                                Rearranging Order of Aligned Objects For Minimal Space Usage
                            
                                When implementing a system call, how do you expose the system call number to userland?
                            
                                Calling a C style function pointer in a WebAssembly from JavaScript
                            
                                Shifting unsigned int more than the size of it, undefined or not?
                            
                                How to safely and correctly destroy a mutex in Linux using pthread_mutex_destroy?
                            
                                Optimizing out helper functions
                            
                                why will my buffer overflow exploit open a user shell only instead of a root shell?
                            
                                In C, is the condition : "if(a != NULL)" the same as the condition "if(a)"? [duplicate]
                            
                                libreadline.so.7: undefined symbol: UP
                            
                                why foo((&i)++) gives Lvalue required error. There is no array associated
                            
                                What is the difference between PTHREAD_PRIO_INHERIT and PTHREAD_PRIO_PROTECT?
                            
                                Relocation R_X86_64_PC32 against symbol when calling function from inline assembly
                            
                                How read coprocessor registers in ARM architecture
                            
                                Why is my video memory offset calculation off by one?
                            
                                Is a write operation in unix atomic? [duplicate]
                            
                                How can I call a library function in an other library not directly connected to it?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With