It's from this question.
gcc -c test.s
objcopy -O binary test.o test.bin
What's the difference between test.o
and test.bin
?
.text
call start
str:
.string "test\n"
start:
movl $4, %eax
movl $1, %ebx
pop %ecx
movl $5, %edx
int $0x80
ret
What's the above doing?
objcopy -O binary
copies the contents of the source file. Here, test.o
is a "relocatable object file": that's code, and also a symbol table and relocation information, which allows the file to be linked with other files into an executable program. The test.bin
file produced by objcopy
contains the code only, no symbol table or relocation information. Such a "raw" file is useless for "normal" programming, but handy for code which has its own loader.
I assume that you use Linux on a 32-bit x86 system. Your test.o
file has size 515 bytes. If you try objdump -x test.o
you get the following, which describes the contents of the test.o
object file:
$ objdump -x test.o
test.o: file format elf32-i386
test.o
architecture: i386, flags 0x00000010:
HAS_SYMS
start address 0x00000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000001e 00000000 00000000 00000034 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00000000 00000000 00000000 00000054 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000000 00000000 00000000 00000054 2**2
ALLOC
SYMBOL TABLE:
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
0000000b l .text 00000000 start
00000005 l .text 00000000 str
This gives you quite a lot of information. In particular, the file contains a section called .text
beginning at offset 0x34 in the file (that's 52 in decimal) and of length 0x1e bytes (30 in decimal). You can disassemble it to see the opcodes themselves:
$ objdump -d test.o
test.o: file format elf32-i386
Disassembly of section .text:
00000000 <str-0x5>:
0: e8 06 00 00 00 call b <start>
00000005 <str>:
5: 74 65 je 6c <start+0x61>
7: 73 74 jae 7d <start+0x72>
9: 0a 00 or (%eax),%al
0000000b <start>:
b: b8 04 00 00 00 mov $0x4,%eax
10: bb 01 00 00 00 mov $0x1,%ebx
15: 59 pop %ecx
16: ba 05 00 00 00 mov $0x5,%edx
1b: cd 80 int $0x80
1d: c3 ret
This is more or less the assembly you started with. The je
, jae
and or
opcodes in the middle are spurious: this is objdump
trying to interpret the literal string ("test\n"
, resulting in the bytes 0x74 0x65 0x73 0x64 0x0a 0x00) as opcodes. objdump -d
also shows you the actual bytes found in the .text
section, i.e. the bytes in the file beginning at offset 0x34. The first bytes are 0xe8 0x06 0x00...
Now, have a look at the test.bin
file. It has length 30 bytes. Let's see those bytes in hexadecimal:
$ hd test.bin
00000000 e8 06 00 00 00 74 65 73 74 0a 00 b8 04 00 00 00 |.....test.......|
00000010 bb 01 00 00 00 59 ba 05 00 00 00 cd 80 c3 |.....Y........|
we recognize here exactly the 30 bytes from the .text
section in test.o
. That's what objcopy -O binary
did: it extracted the file contents, i.e. the only non-empty section, i.e. the raw opcodes themselves, removing everything else, in particular the symbol table and relocation information.
Relocation is about what must be changed in a given piece of code so that it runs properly when stored at a given place in memory. For instance, if the code uses a variable and wishes to obtain the address of that variable, then the relocation information will contain an entry telling to whoever will actually place the code in memory (normally, the linker): "here in the code, when you know where the variable will actually be, write the variable address". Interestingly, the code you show needs no relocation: the sequence of bytes can be written at an arbitrary memory location and executed as is.
Let's have a look at what the code does.
call
opcode jumps to the mov
instruction at offset 0x0b. Also, since this is a call
, it pushes on the stack the return address. The return address is where execution should continue after the call is completed, i.e. when a ret
opcode is reached. This is the address of the byte following the call
opcode. Here, that address is the address of the first byte of the literal string "test\n"
.movl
load %eax
and %ebx
with numerical values 4 and 1, respectively.pop
opcode removes the top element from the stack, storing it in %ecx
. What is this top element ? That's precisely the address pushed on the stack by the call
opcode, i.e. the address of the first byte of the literal string.movl
loads %edx
with the numerical value 5.int $0x80
is the system call on 32-bit x86 Linux: this invokes the kernel. The kernel will look at the registers to know what to do. The kernel first looks at %eax
to get the "system call number"; on 32-bit x86, "4" is __NR_write
, i.e. the write()
system call. This call expects three parameters, in registers %ebx
, %ecx
and %edx
, in that order. These are the destination file descriptor (here 1: that's standard output), a pointer to the data to write (here the literal string), and the length of the data to write (here 5, which corresponds to the four letters and the newline character). So this writes "test\n"
on standard output.ret
returns to the caller. ret
pops a value from the stack, and jumps to that address. This assumes that this code chunk was invoked with a call
opcode.So, to sum up, the code prints out test
with a newline.
Let's try it with a custom loader:
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
int
main(void)
{
void *p;
int f;
p = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
f = open("test.bin", O_RDONLY);
read(f, p, 30);
close(f);
mprotect(p, 30, PROT_READ | PROT_EXEC);
((void (*)(void))p)();
return 0;
}
(The code above does not test returned values for errors, which is very bad, of course.)
Here, I allocate a page of memory (4096 bytes) with mmap()
, asking for a page where I can read and write. p
points to that chunk. Then, with open()
, read()
and close()
, I read the contents of the test.bin
file (30 bytes) into that chunk.
The mprotect()
call instructs the kernel to change the access rights for my page: for now on, I will want to be able to execute those bytes, i.e. consider them as machine code. I give up the right to write into the chunk (depending on the exact kernel configuration, having a page which can be both written to and executed may be forbidden).
The cryptic ((void (*)(void))p)();
reads as thus: I take p
; I cast it as a pointer to a function which takes no argument and returns nothing; I invoke that function. This is C syntax for making a call
into my chunk of data.
When I run that program, I get:
$ ./blah
test
which is what was expected: the code in test.bin
writes out test
on the standard output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With