I have disassembled a C program with Radare2. Inside this program there are many calls to scanf
like the following:
0x000011fe 488d4594 lea rax, [var_6ch]
0x00001202 4889c6 mov rsi, rax
0x00001205 488d3df35603. lea rdi, [0x000368ff] ; "%d" ; const char *format
0x0000120c b800000000 mov eax, 0
0x00001211 e86afeffff call sym.imp.__isoc99_scanf ; int scanf(const char *format)
0x00001216 8b4594 mov eax, dword [var_6ch]
0x00001219 83f801 cmp eax, 1 ; rsi ; "ELF\x02\x01\x01"
0x0000121c 740a je 0x1228
Here scanf
has the address of the string "%d"
passed to it from the line lea rdi, [0x000368ff]
. I'm assuming 0x000368ff
is the location of "%d"
in the exectable file because if I restart Radare2 in debugging mode (r2 -d ./exec
) then lea rdi, [0x000368ff]
is replaced by lea rdi, [someMemoryAddress]
.
If lea rdi, [0x000368ff]
is whats hard coded in the file then how does the instruction change to the actual memory address when run?
Radare is tricking you, what you see is not the real instruction, it has been simplified for you.
The real instruction is:
0x00001205 488d3df3560300 lea rdi, qword [rip + 0x356f3]
0x0000120c b800000000 mov eax, 0
This is a typical position independent lea
. The string to use is stored in your binary at the offset 0x000368ff
, but since the executable is position independent, the real address needs to be calculated at runtime. Since the next instruction is at offset 0x0000120c
, you know that, no matter where the binary is loaded in memory, the address you want will be rip + (0x000368ff - 0x0000120c)
= rip + 0x356f3
, which is what you see above.
When doing static analysis, since Radare does not know the base address of the binary in memory, it simply calculates 0x0000120c + 0x356f3
= 0x000368ff
. This makes reverse engineering easier, but can be confusing since the real instruction is different.
As an example, the following program:
int main(void) {
puts("Hello world!");
}
When compiled produces:
6b4: 48 8d 3d 99 00 00 00 lea rdi,[rip+0x99]
6bb: e8 a0 fe ff ff call 560 <puts@plt>
So rip + 0x99
= 0x6bb + 0x99
= 0x754
, and if we take a look at offset 0x754
in the binary with hd
:
$ hd -s 0x754 -n 16 a.out
00000754 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 00 00 00 00 |Hello world!....|
00000764
The full instruction is
48 8d 3d f3 56 03 00
This instruction is literally
lea rdi, [rip + 0x000356f3]
with a rip
relative addressing mode. The instruction pointer rip
has the value 0x0000120c
when the instruction is executed, thus rdi
receives the desired value 0x000368ff
.
If this is not the real address, it is possible that your program is a position-independent executable (PIE) which is subject to relocation. Since the address is encoded using a rip-relative addressing mode, no relocation is needed and the address is correct, regardless of where the binary is loaded.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With