Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC optimization differences in recursive functions using globals

Tags:

c

gcc

The other day I ran into a weird problem using GCC and the '-Ofast' optimization flag. Compiling the below program using 'gcc -Ofast -o fib1 fib1.c'.

#include <stdio.h>

int f1(int n) {
    if (n < 2) {
        return n;
    }
    int a, b;
    a = f1(n - 1);
    b = f1(n - 2);
    return a + b;
}

int main(){
    printf("%d", f1(40));
}

When measuring execution time, the result is:

peter@host ~ $ time ./fib1
102334155
real    0m0.511s
user    0m0.510s
sys     0m0.000s

Now let's introduce a global variable in our program and compile again using 'gcc -Ofast -o fib2 fib2.c'.

#include <stdio.h>

int global;

int f1(int n) {
    if (n < 2) {
        return n;
    }
    int a, b;
    a = f1(n - 1);
    b = f1(n - 2);

    global = 0;

    return a + b;
}

int main(){
    printf("%d", f1(40));
}

Now the execution time is:

peter@host ~ $ time ./fib2
102334155
real    0m0.265s
user    0m0.265s
sys     0m0.000s

The new global variable does not do anything meaningful. However, the difference in execution time is considerable.

Apart from the question (1) what the reason is for such behavior, it also would be nice if (2) the last performance could be achieved without introducing meaningless variables. Any suggestions?

Thanks Peter

like image 380
Peter Avatar asked Dec 23 '15 13:12

Peter


1 Answers

I believe you hit some very clever and very weird gcc (mis-?)optimization. That's about as far as I got in researching this.

I modified your code to have an #ifdef G around the global:

$ cc -O3 -o foo foo.c && time ./foo
102334155

real    0m0.634s
user    0m0.631s
sys     0m0.001s
$ cc -O3 -DG -o foo foo.c && time ./foo
102334155

real    0m0.365s
user    0m0.362s
sys     0m0.001s

So I have the same weird performance difference.

When in doubt, read the generated assembler.

$ cc -S -O3 -o foo.s -S foo.c
$ cc -S -DG -O3 -o foog.s -S foo.c

Here it gets truly bizarre. Normally I can follow gcc-generated code pretty easily. The code that got generated here is just incomprehensible. What should be pretty straightforward recursion and addition that should fit in 15-20 instructions, gcc expanded to a several hundred instructions with a flurry of shifts, additions, subtractions, compares, branches and a large array on the stack. It looks like it tried to partially convert one or both recursions into an iteration and then unrolled that loop. One thing struck me though, the non-global function had only one recursive call to itself (the second one is the call from main):

$ grep 'call.*f1' foo.s | wc
      2       4      18

While the global one one had:

$ grep 'call.*f1' foog.s | wc
     33      66     297

My educated (I've seen this many times before) guess? Gcc tried to be clever and in its fervor the function that in theory should be easier to optimize generated worse code while the write to the global variable made it sufficiently confused that it couldn't optimize so hard that it led to better code. This happens all the time, many optimizations that gcc (and other compilers too, let's not single them out) uses are very specific to certain benchmarks they use and might not generate faster running code in many other cases. In fact, from experience I only ever use -O2 unless I've benchmarked things very carefully to see that -O3 makes sense. It very often doesn't.

If you really want to research this further, I'd recommend reading gcc documentation about which optimizations get enabled with -O3 as opposed to -O2 (-O2 doesn't do this), then try them one by one until you find which one causes this behavior and that optimization should be a pretty good hint for what's going on. I was about to do this, but I ran out of time (must do last minute christmas shopping).

like image 128
Art Avatar answered Sep 29 '22 12:09

Art