Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is an empty function not just a return

If I compile an empty C function

void nothing(void)
{
}

using gcc -O2 -S (and clang) on MacOS, it generates:

_nothing:
    pushq   %rbp
    movq    %rsp, %rbp
    popq    %rbp
    ret

Why does gcc not remove everything but the ret? It seems like an easy optimisation to make unless it really does something (seems not to, to me). This pattern (push/move at the beginning, pop at the end) is also visible in other non-empty functions where rbp is otherwise unused.

On Linux using a more recent gcc (4.4.5) I see just

nothing:
    rep
    ret

Why the rep ? The rep is absent in non-empty functions.

like image 354
William Morris Avatar asked Oct 21 '22 03:10

William Morris


2 Answers

Why the rep ?

The reasons are explained in this blog post. In short, jumping directly to a single-byte ret instruction would mess up the branch prediction on some AMD processors. And rather than adding a nop before the ret, a meaningless prefix byte was added to save instruction decoding bandwidth.

The rep is absent in non-empty functions.

To quote from the blog post I linked to: "[rep ret] is preferred to the simple ret either when it is the target of any kind of branch, conditional (jne/je/...) or unconditional (jmp/call/...)".
In the case of an empty function, the ret would have been the direct target of a call. In a non-empty function, it wouldn't be.

Why does gcc not remove everything but the ret?

It's possible that some compilers won't omit frame pointer code even if you've specified -O2. At least with gcc, you can explicitly tell the compiler to omit them by using the -fomit-frame-pointer option.

like image 82
Michael Avatar answered Oct 28 '22 23:10

Michael


As explained here: http://support.amd.com/us/Processor_TechDocs/25112.PDF, a two-byte near-return instruction (i.e. rep ret) is used because a single-byte return can me mispredicted on some on some amd64 processors in some situations such as this one.

If you fiddle around with the processor targeted by gcc you may find that you can get it to generate a single-byte ret. -mtune=nocona worked for me.

like image 37
CB Bailey Avatar answered Oct 28 '22 21:10

CB Bailey