I have written a simple Hello World program.
#include <stdio.h>
int main() {
printf("Hello World");
return 0;
}
I wanted to understand how the relocatable object file and executable file look like. The object file corresponding to the main function is
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: bf 00 00 00 00 mov $0x0,%edi
9: b8 00 00 00 00 mov $0x0,%eax
e: e8 00 00 00 00 callq 13 <main+0x13>
13: b8 00 00 00 00 mov $0x0,%eax
18: c9 leaveq
19: c3 retq
Here the function call for printf is callq 13. One thing i don't understand is why is it 13. That means call the function at adresss 13, right??. 13 has the next instruction, right?? Please explain me what does this mean??
The executable code corresponding to main is
00000000004004cc <main>:
4004cc: 55 push %rbp
4004cd: 48 89 e5 mov %rsp,%rbp
4004d0: bf dc 05 40 00 mov $0x4005dc,%edi
4004d5: b8 00 00 00 00 mov $0x0,%eax
4004da: e8 e1 fe ff ff callq 4003c0 <printf@plt>
4004df: b8 00 00 00 00 mov $0x0,%eax
4004e4: c9 leaveq
4004e5: c3 retq
Here it is callq 4003c0. But the binary instruction is e8 e1 fe ff ff. There is nothing that corresponds to 4003c0. What is that i am getting wrong?
Thanks. Bala
A relocatable object file holds sections containing code and data. This file is suitable to be linked with other relocatable object files to create dynamic executable files, shared object files, or another relocatable object. A dynamic executable file holds a program that is ready to execute.
A relocatable file holds code and data suitable to be linked with other object files to create an executable or shared object file, or another relocatable object. An executable file holds a program that is ready to execute. The file specifies how exec(2) creates a program's process image.
A relocatable file holds code and data suitable for linking with other object files to create an executable or a shared object file. An executable file holds a program suitable for execution; the file specifies how exec (BA_OS) creates a program's process image.
In the first case, take a look at the instruction encoding - it's all zeroes where the function address would go. That's because the object hasn't been linked yet, so the addresses for external symbols haven't been hooked up yet. When you do the final link into the executable format, the system sticks another placeholder in there, and then the dynamic linker will finally add the correct address for printf()
at runtime. Here's a quick example for a "Hello, world" program I wrote.
First, the disassembly of the object file:
00000000 <_main>:
0: 8d 4c 24 04 lea 0x4(%esp),%ecx
4: 83 e4 f0 and $0xfffffff0,%esp
7: ff 71 fc pushl -0x4(%ecx)
a: 55 push %ebp
b: 89 e5 mov %esp,%ebp
d: 51 push %ecx
e: 83 ec 04 sub $0x4,%esp
11: e8 00 00 00 00 call 16 <_main+0x16>
16: c7 04 24 00 00 00 00 movl $0x0,(%esp)
1d: e8 00 00 00 00 call 22 <_main+0x22>
22: b8 00 00 00 00 mov $0x0,%eax
27: 83 c4 04 add $0x4,%esp
2a: 59 pop %ecx
2b: 5d pop %ebp
2c: 8d 61 fc lea -0x4(%ecx),%esp
2f: c3 ret
Then the relocations:
main.o: file format pe-i386
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
00000012 DISP32 ___main
00000019 dir32 .rdata
0000001e DISP32 _puts
As you can see there's a relocation there for _puts
, which is what the call to printf
turned into. That relocation will get noticed at link time and fixed up. In the case of dynamic library linking, the relocations and fixups might not get fully resolved until the program is running, but you'll get the idea from this example, I hope.
The target of the call in the E8
instruction (call
) is specified as relative offset from the current instruction pointer (IP) value.
In your first code sample the offset is obviously 0x00000000
. It basically says
call +0
The actual address of printf
is not known yet, so the compiler just put the 32-bit value 0x00000000
there as a placeholder.
Such incomplete call with zero offset will naturally be interpreted as the call to the current IP value. On your platform, the IP is pre-incremented, meaning that when some instruction is executed, the IP contains the address of the next instruction. I.e. when instruction at the address 0xE
is executed the IP contains value 0x13
. And the call +0
is naturally interpreted as the call to instruction 0x13
. This is why you see that 0x13
in the disassembly of the incomplete code.
Once the code is complete, the placeholder 0x00000000
offset is replaced with the actual offset of printf
function in the code. The offset can be positive (forward) or negative (backward). In your case the IP at the moment of the call is 0x4004DF
, while the address of printf
function is 0x4003C0
. For this reason, the machine instruction will contain a 32-bit offset value equal to 0x4003C0 - 0x4004DF
, which is negative value -287
. So what you see in the code is actually
call -287
-287
is 0xFFFFFEE1
in binary. This is exactly what you see in your machine code. It is just that the tool you are using displayed it backwards.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With