If I compile an empty C function
void nothing(void)
{
}
using gcc -O2 -S
(and clang
) on MacOS, it generates:
_nothing:
pushq %rbp
movq %rsp, %rbp
popq %rbp
ret
Why does gcc
not remove everything but the ret
? It seems like an easy optimisation to make unless it really does something (seems not to, to me). This pattern (push/move at the beginning, pop at the end) is also visible in other non-empty functions where rbp
is otherwise unused.
On Linux using a more recent gcc
(4.4.5) I see just
nothing:
rep
ret
Why the rep
? The rep
is absent in non-empty functions.
Why the rep ?
The reasons are explained in this blog post. In short, jumping directly to a single-byte ret
instruction would mess up the branch prediction on some AMD processors. And rather than adding a nop
before the ret
, a meaningless prefix byte was added to save instruction decoding bandwidth.
The rep is absent in non-empty functions.
To quote from the blog post I linked to: "[rep ret
] is preferred to the simple ret
either when it is the target of any kind of branch, conditional (jne/je/...
) or unconditional (jmp/call/...
)".
In the case of an empty function, the ret
would have been the direct target of a call
. In a non-empty function, it wouldn't be.
Why does gcc not remove everything but the ret?
It's possible that some compilers won't omit frame pointer code even if you've specified -O2
. At least with gcc, you can explicitly tell the compiler to omit them by using the -fomit-frame-pointer
option.
As explained here: http://support.amd.com/us/Processor_TechDocs/25112.PDF, a two-byte near-return instruction (i.e. rep ret
) is used because a single-byte return can me mispredicted on some on some amd64 processors in some situations such as this one.
If you fiddle around with the processor targeted by gcc you may find that you can get it to generate a single-byte ret
. -mtune=nocona
worked for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With