Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is gcc -O2 doing to the recursive Fibonacci function?

I am learning x86 assembler in order to write a compiler. In particular, I'm taking a variety of simple recursive functions and feeding them through different compilers (OCaml, GCC etc.) in order to get a better understanding of the kinds of assembler generated by different compilers.

I've got a trivial recursive integer Fibonacci function:

int fib(int x) { return (x < 2 ? x : fib(x-1)+fib(x-2)); }

My hand-coded assembly looks like this:

fib:
    cmp eax, 2
    jl  fin
    push    eax
    dec eax
    call    fib
    push    eax
    mov eax, [esp+4]
    add eax, -2
    call    fib
    add eax, [esp]
    add esp, 8
fin:
    ret

Compiling that function to Intel-syntax assembler using gcc -O2 produces this enigmatic code:

_fib:
    push    edi
    push    esi
    push    ebx
    sub esp, 16
    mov edi, DWORD PTR [esp+32]
    cmp edi, 1
    jle L4
    mov ebx, edi
    xor esi, esi
L3:
    lea eax, [ebx-1]
    mov DWORD PTR [esp], eax
    call    _fib
    sub ebx, 2
    add esi, eax
    cmp ebx, 1
    jg  L3
    and edi, 1
L2:
    lea eax, [esi+edi]
    add esp, 16
    pop ebx
    pop esi
    pop edi
    ret
L4:
    xor esi, esi
    jmp L2

So I guess the calling convention is argument at [esp+4] and return value in eax. It starts by pushing edi, esi and ebx. Then it claims another 16 bytes for a stack frame, enough for 4 temporary ints. Then edi is read from [esp+32], which is the argument. If the argument is <=1 then it jumps to L4 which zeroes out (?) esi before jumping back to L2 which sets eax=esi+edi which is just the argument edi. If the argument was >1 then the argument is copied into ebx and zeroes esi before falling through into L3. In L3, it sets eax=ebx-1 and stores the result (n-1) at esp in the stack frame before recursing to calculate fib(n-1). The result is added to esi, ebx is set to n-2 and it loops back to L3 if ebx>1 otherwise it extracts the lower bit of edi before falling through to L2.

Why is this code so convoluted (e.g. is there a name for an optimization that has been done that I'm not seeing?)?

The recursive calls fib(n-2) seem to have been replaced with a loop accumulating in esi but that call wasn't in tail position so how was this done?

What is the purpose of the and edi, 1?

What is the purpose of the mov DWORD PTR [esp], eax?

Why is the stack frame so large?

Can you disassemble this algorithm back into C to make it clearer what is going on?

My preliminary impression is that GCC generates pretty poor x86 assembler. In this case, over 2× more code for equal performance (3.25s for fib(40) on this 1.6GHz Atom for both programs). Is that fair?

like image 209
J D Avatar asked Apr 07 '12 21:04

J D


3 Answers

In addition to the comments above, note that the recursion has been unwound into a tail call by replacing:

return x < 2 ? x : fib(x - 2) + fib(x - 1);

with:

if ((xprime = x) < 2) {
    acc = 0;
} else {
    /* at this point we know x >= 2 */
    acc = 0; /* start with 0 */
    while (x > 1) {
       acc += fib(x - 1); /* add fib(x-1) */
       x -= 2; /* now we'll add fib(x-2) */
    }
    /* so at this point we know either x==1 or x==0 */
    xprime = x == 1 ? 1 : 0; /* ie, x & 1 */
}
return xprime + acc;

I suspect this rather tricky loop arose from multiple optimization steps, not that I have fiddled with gcc optimization since about gcc 2.3 (it's all very different inside now!).

like image 116
torek Avatar answered Oct 12 '22 05:10

torek


Quite simply, fib(x-2) is equal to fib(x-3) + fib(x-4), fib(x-4) is equal to fib(x-5) + fib(x-6) etc. so fib(x) is being calculated as fib(x-1) + fib(x-3) + fib(x-5) + ... + fib(x&1) (fib(x&1) equals x&1) i.e. gcc has inlined the call to fib(x-2), which is quite a clever thing to do to a recursive function.

like image 23
Neil Avatar answered Oct 12 '22 05:10

Neil


This first part is ensuring registers that should be preserved according to the calling convention are not trashed. I would guess the calling convention used here is cdecl.

_fib:
    push    edi
    push    esi
    push    ebx
    sub esp, 16

DWORD PTR[esp+32] is your x:

    mov edi, DWORD PTR [esp+32]
    cmp edi, 1
    jle L4

If x is less than or equal to 1 (this corresponds to your x < 2), then go to L4 which is:

L4:
    xor esi, esi
    jmp L2

This zeroes out esi and branches to L2:

L2:
    lea eax, [esi+edi]
    add esp, 16
    pop ebx
    pop esi
    pop edi
    ret

This sets eax (the return value) with esi+edi. Since esi is 0 already, edi is just loaded in the case of 0 and 1. This corresponds to x < 2 ? x.

Now we look at the case when x is not < 2:

    mov ebx, edi
    xor esi, esi
L3:
    lea eax, [ebx-1]
    mov DWORD PTR [esp], eax
    call    _fib

First, x is copied to ebx, then esi is zeroed.

Next, eax is set to x - 1. This value is moved to the top of the stack and _fib called. This corresponds to fib(x-1).

    sub ebx, 2
    add esi, eax

This subtracts 2 from ebx (x). Then eax (return value from _fib call is added to esi, which was set to 0 before). Hence esi now holds the result of fib(x-1).

    cmp ebx, 1
    jg  L3
    and edi, 1

ebx is compared to 1. If it is greater than 1, then we loop back to L3. Otherwise (the case where it is 0 or 1), we perform and edi, 1 and fall through to L2 (we analysed what this does earlier already). The and edi, 1 is equivalent to performing a %2 on x.

From a high level, this is what the code does:

  • Sets up stack frame and saves registers
  • If x < 2, then return x.
  • Keep calling and summing fib(x-...) until x is smaller than 2. In this case, fall through to the x < 2 case.

The optimization is that GCC unwinds the cases where x >= 2 by doing them in a loop instead of recursively.

like image 40
Mike Kwan Avatar answered Oct 12 '22 05:10

Mike Kwan