In my Linux program, I need a function that takes an address addr
and checks whether a callq
instruction placed at addr
is calling an specific function func
loaded from a shared library. I mean, I need to check whether I have something like callq func@PLT
at addr
.
So, on Linux, how to reach the real address of a function func
from a callq func@PLT
instruction?
You can only find out about that at runtime, after the dynamic linker resolves the actual load address.
Warning: What follows is slightly deeper magic ...
To illustrate what's happening use a debugger:
#include <stdio.h>
int main(int argc, char **argv) { printf("Hello, World!\n"); return 0; }
Compile it (gcc -O8 ...
). objdump -d
on the binary shows (the optimization of printf()
being substituted with puts()
for a plain string not withstanding ...):
Disassembly of section .init: [ ... ] Disassembly of section .plt: 0000000000400408 <__libc_start_main@plt-0x10>: 400408: ff 35 a2 04 10 00 pushq 1049762(%rip) # 5008b0 <_GLOBAL_OFFSET_TABLE_+0x8>> 40040e: ff 25 a4 04 10 00 jmpq *1049764(%rip) # 5008b8 <_GLOBAL_OFFSET_TABLE_+0x10> [ ... ] 0000000000400428 <puts@plt>: 400428: ff 25 9a 04 10 00 jmpq *1049754(%rip) # 5008c8 <_GLOBAL_OFFSET_TABLE_+0x20> 40042e: 68 01 00 00 00 pushq $0x1 400433: e9 d0 ff ff ff jmpq 400408 <_init+0x18> [ ... ] 0000000000400500 <main>: 400500: 48 83 ec 08 sub $0x8,%rsp 400504: bf 0c 06 40 00 mov $0x40060c,%edi 400509: e8 1a ff ff ff callq 400428 <puts@plt> 40050e: 31 c0 xor %eax,%eax 400510: 48 83 c4 08 add $0x8,%rsp 400514: c3 retq
Now load it into gdb
. Then:
$ gdb ./tcc GNU gdb Red Hat Linux (6.3.0.0-0.30.1rh) [ ... ] (gdb) x/3i 0x400428 0x400428: jmpq *1049754(%rip) # 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32> 0x40042e: pushq $0x1 0x400433: jmpq 0x400408 (gdb) x/gx 0x5008c8 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32>: 0x000000000040042e
Notice this value points back to the instruction directly following the first jmpq
; this means the puts@plt
slot, on first invocation, will simply "fall through" to:
(gdb) x/3i 0x400408 0x400408: pushq 1049762(%rip) # 0x5008b0 <_GLOBAL_OFFSET_TABLE_+8> 0x40040e: jmpq *1049764(%rip) # 0x5008b8 <_GLOBAL_OFFSET_TABLE_+16> 0x400414: nop (gdb) x/gx 0x5008b0 0x5008b0 <_GLOBAL_OFFSET_TABLE_+8>: 0x0000000000000000 (gdb) x/gx 0x5008b8 0x5008b8 <_GLOBAL_OFFSET_TABLE_+16>: 0x0000000000000000
The function address and argument aren't initialized yet.
This is the state just after program load, but before executing. Now start executing it:
(gdb) break main Breakpoint 1 at 0x400500 (gdb) run Starting program: tcc (no debugging symbols found) (no debugging symbols found) Breakpoint 1, 0x0000000000400500 in main () (gdb) x/i 0x400428 0x400428: jmpq *1049754(%rip) # 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32> (gdb) x/gx 0x5008c8 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32>: 0x000000000040042e
So this hasn't changed yet - but the targets (the GOT
contents for the libc
initialization) are different now:
(gdb) x/gx 0x5008b0 0x5008b0 <_GLOBAL_OFFSET_TABLE_+8>: 0x0000002a9566b9a8 (gdb) x/gx 0x5008b8 0x5008b8 <_GLOBAL_OFFSET_TABLE_+16>: 0x0000002a955609f0 (gdb) disas 0x0000002a955609f0 Dump of assembler code for function _dl_runtime_resolve: 0x0000002a955609f0 <_dl_runtime_resolve+0>: sub $0x38,%rsp [ ... ]
I.e. at program load time, the dynamic linker will resolve the "init
" parts first. It substitutes the GOT
references with pointers that redirect into the dynamic linking code.
Therefore, when first calling an external-to-the-binary function through the .plt
reference, it'll jump into the linker again. Let it do that, then inspect the program after that - the state has changed again:
(gdb) break *0x0000000000400514 Breakpoint 2 at 0x400514 (gdb) continue Continuing. Hello, World! Breakpoint 2, 0x0000000000400514 in main () (gdb) x/i 0x400428 0x400428: jmpq *1049754(%rip) # 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32> (gdb) x/gx 0x5008c8 0x5008c8 : 0x0000002a956c8870 (gdb) disas 0x0000002a956c8870 Dump of assembler code for function puts: 0x0000002a956c8870 <puts+0>: mov %rbx,0xffffffffffffffe0(%rsp) [ ... ]
So there's your redirect right into libc
now - the PLT
reference to puts()
finally got resolved.
The instructions to the linker where to insert the actual function load addresses (that we've seen it do for _dl_runtime_resolve
comes from special sections in the ELF binary:
$ readelf -a tcc [ ... ] Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align [ ... ] INTERP 0x0000000000000200 0x0000000000400200 0x0000000000400200 0x000000000000001c 0x000000000000001c R 1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] [ ... ] Dynamic section at offset 0x700 contains 21 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] [ ... ] Relocation section '.rela.plt' at offset 0x3c0 contains 2 entries: Offset Info Type Sym. Value Sym. Name + Addend 0000005008c0 000100000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0 0000005008c8 000200000007 R_X86_64_JUMP_SLO 0000000000000000 puts + 0
There's more to ELF than just the above, but these three pieces tell the kernel's binary format handler "this ELF binary has an interpreter" (which is the dynamic linker) that needs to be loaded / initialized first, that it requires libc.so.6
, and that offsets 0x5008c0
and 0x5008c8
in the program's writeable data section must be substituted by the load addresses for __libc_start_main
and puts
, respectively, when the step of dynamic linking is actually performed.
How exactly that happens, from ELF's point of view, is up to the details of the interpreter (aka, the dynamic linker implementation).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With