How does a compiler optimise this factorial function so well?

People also ask

How does the compiler optimize?

Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources or executes faster.

How does the C++ compiler optimize?

The C/C++ compiler compiles each source file separately and produces the corresponding object file. This means the compiler can only apply optimizations on a single source file rather than on the whole program. However, some important optimizations can be performed only by looking at the whole program.

Why do compilers perform optimization in code?

The code optimization in the synthesis phase is a program transformation technique, which tries to improve the intermediate code by making it consume fewer resources (i.e. CPU, Memory) so that faster-running machine code will result.

So I have been having a look at some of the magic that is O3 in GCC (well actually I'm compiling using Clang but it's the same with GCC and I'm guessing a large part of the optimiser was pulled over from GCC to Clang).

Consider this C program:

int foo(int n) {
    if (n == 0) return 1;
    return n * foo(n-1);
}

int main() {
    return foo(10);
}

The first thing I was pretty WOW-ed at (which was also WOW-ed at in this question - https://stackoverflow.com/a/414774/1068248) was how int foo(int) (a basic factorial function) compiles into a tight loop. This is the ARM assembly for it:

    .globl  _foo
    .align  2
    .code   16
    .thumb_func _foo
_foo:
    mov r1, r0
    movs    r0, #1
    cbz r1, LBB0_2
LBB0_1:
    muls    r0, r1, r0
    subs    r1, #1
    bne LBB0_1
LBB0_2:
    bx  lr

Blimey I thought. That's pretty interesting! Completely tight looping to do the factorial. WOW. It's not a tail call optimisation since, well, it's not a tail call. But it appears to have done a much similar optimisation.

Now look at main:

    .globl  _main
    .align  2
    .code   16
    .thumb_func _main
_main:
    movw    r0, #24320
    movt    r0, #55
    bx  lr

That just blew my mind to be honest. It's just totally bypassing foo and returning 3628800 which is 10!.

This makes me really realise how your compiler can often do a much better job than you can at optimising your code. But it raises the question, how does it manage to do such a good job? So, can anyone explain (possibly by linking to relevant code) how the following optimisations work:

The initial foo optimisation to be a tight loop.
The optimisation where main just goes and returns the result directly rather than actually executing foo.

Also another interesting side effect of this question would be to show some more interesting optimisations which GCC/Clang can do.

Related questions
                            
                                Understanding class required and exist
                            
                                Add files to ZIP without paths, using SharpZipLib
                            
                                How to approximate the count of distinct values in an array in a single pass through it
                            
                                Catching event when following a link
                            
                                Android auto-logout when app goes to background
                            
                                Getting correct call stacks in VS Concurrency profiler
                            
                                BitArray returns bits the wrong way around?
                            
                                Hibernate Set Or List
                            
                                Regex to match the longest repeating substring
                            
                                Why NDEBUG instead of RELEASE?
                            
                                Orthogonal regression fitting in scipy least squares method
                            
                                Detecting the parameter types in a Spirit semantic action

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does a compiler optimise this factorial function so well?

Tags:

People also ask

Recent Activity

Donate For Us