Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does a compiler optimise this factorial function so well?

Tags:

People also ask

How does the compiler optimize?

Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources or executes faster.

How does the C++ compiler optimize?

The C/C++ compiler compiles each source file separately and produces the corresponding object file. This means the compiler can only apply optimizations on a single source file rather than on the whole program. However, some important optimizations can be performed only by looking at the whole program.

Why do compilers perform optimization in code?

The code optimization in the synthesis phase is a program transformation technique, which tries to improve the intermediate code by making it consume fewer resources (i.e. CPU, Memory) so that faster-running machine code will result.


So I have been having a look at some of the magic that is O3 in GCC (well actually I'm compiling using Clang but it's the same with GCC and I'm guessing a large part of the optimiser was pulled over from GCC to Clang).

Consider this C program:

int foo(int n) {
    if (n == 0) return 1;
    return n * foo(n-1);
}

int main() {
    return foo(10);
}

The first thing I was pretty WOW-ed at (which was also WOW-ed at in this question - https://stackoverflow.com/a/414774/1068248) was how int foo(int) (a basic factorial function) compiles into a tight loop. This is the ARM assembly for it:

    .globl  _foo
    .align  2
    .code   16
    .thumb_func _foo
_foo:
    mov r1, r0
    movs    r0, #1
    cbz r1, LBB0_2
LBB0_1:
    muls    r0, r1, r0
    subs    r1, #1
    bne LBB0_1
LBB0_2:
    bx  lr

Blimey I thought. That's pretty interesting! Completely tight looping to do the factorial. WOW. It's not a tail call optimisation since, well, it's not a tail call. But it appears to have done a much similar optimisation.

Now look at main:

    .globl  _main
    .align  2
    .code   16
    .thumb_func _main
_main:
    movw    r0, #24320
    movt    r0, #55
    bx  lr

That just blew my mind to be honest. It's just totally bypassing foo and returning 3628800 which is 10!.

This makes me really realise how your compiler can often do a much better job than you can at optimising your code. But it raises the question, how does it manage to do such a good job? So, can anyone explain (possibly by linking to relevant code) how the following optimisations work:

  1. The initial foo optimisation to be a tight loop.

  2. The optimisation where main just goes and returns the result directly rather than actually executing foo.

Also another interesting side effect of this question would be to show some more interesting optimisations which GCC/Clang can do.