When I compile
#include <stdio.h>
int
main () {
return 0;
}
to x86 assembly the result is plain and expected:
$> cc -m32 -S main.c -o -|sed -r "/\s*\./d"
main:
pushl %ebp
movl %esp, %ebp
movl $0, %eax
popl %ebp
ret
But when studying different disassembled binaries, the function prologue is never that simple. Indeed, changing the C source above into
#include <stdio.h>
int
main () {
printf("Hi");
return 0;
}
the result is
$> cc -m32 -S main.c -o -|sed -r "/\s*\./d"
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $4, %esp
subl $12, %esp
call printf
addl $16, %esp
movl $0, %eax
movl -4(%ebp), %ecx
leave
leal -4(%ecx), %esp
ret
In particular, I don't get why these instructions
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
are generated -- specifically why not directly storing %esp into %ecx, instead of into%esp+4?
If main isn't a leaf function, it needs to align the stack for the benefit of any functions it calls. Functions that aren't called main just maintain the stack's alignment.
lea 4(%esp), %ecx # ecx = esp+4
andl $-16, %esp
pushl -4(%ecx) # load from ecx-4 and push that
It's pushing a copy of the return address, so it will be in the right place after aligning the stack. You're right, a different sequence would be more sensible:
mov (%esp), %ecx ; or maybe even pop %ecx
andl $-16, %esp
push %ecx ; push (mem) is slower than push reg
As Youka says in comments, don't expect code from -O0 to be optimized at all. Use -Og for optimizations that don't interfere with debugability. The gcc manual recommends that for compile/debug/edit cycles. -O0 output is harder to read / understand / learn from than optimized code. It's easier to map back to the source, but it's terrible code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With