This is the code that I tested:
#include <iostream>
#include <chrono>
using namespace std;
#define CHRONO_NOW chrono::high_resolution_clock::now()
#define CHRONO_DURATION(first,last) chrono::duration_cast<chrono::duration<double>>(last-first).count()
int fib(int n) {
if (n<2) return n;
return fib(n-1) + fib(n-2);
}
int main() {
auto t0 = CHRONO_NOW;
cout << fib(45) << endl;
cout << CHRONO_DURATION(t0, CHRONO_NOW) << endl;
return 0;
}
Of course, there are much faster ways of calculating Fibonacci numbers, but this is a good little stress test that focuses on recursive function calls. There's nothing else to the code, other than the use of chrono for measuring time.
First I ran the test a couple of times in Xcode on OS X (so that's clang), using -O3
optimization. It took about 9 seconds to run.
Then, I compiled the same code with gcc (g++) on Ubuntu (using -O3 again), and that version only took about 6.3 seconds to run! Also, I was running Ubuntu inside VirtualBox on my mac, which could only affect the performance negatively, if at all.
So there you go:
I know that these are completely different compilers so they do stuff differently, but all the tests I've seen featuring gcc and clang only showed much less of a difference, and in some cases, the difference was the other way around (clang being faster).
So is there any logical explanation why gcc beats clang by miles in this particular example?
GCC 4.9.2 in compiler explorer really does loop-unrolling and inlines a lot of function calls while Clang 3.5.1 calls fib
twice each iteration without even tail call optimization like below
fib(int): # @fib(int)
push rbp
push rbx
push rax
mov ebx, edi
cmp ebx, 2
jge .LBB0_1
mov eax, ebx
jmp .LBB0_3
.LBB0_1:
lea edi, dword ptr [rbx - 1]
call fib(int) # fib(ebx - 1)
mov ebp, eax
add ebx, -2
mov edi, ebx
call fib(int) # fib(ebx - 2)
add eax, ebp
.LBB0_3:
add rsp, 8
pop rbx
pop rbp
ret
The GCC version is more than 10 times longer, with only a single fib
call and 20+ labels for inlining the call, which also means that the last call has been tail-optimized into a jmp
or GCC has converted some of the recursion into iteration (since it allocates a big array for storing intermediate values)
I've also brought ICC into perspective, and surprisingly it has 10 call
instructions inside fib
, and it also inlines fib
calls 9 times inside main
, but it doesn't convert the recursive code to iterative
Here's the compiler outputs for comparison
Note that you can modify the code like this to make the output easier to read
int fib(int n) {
if (n<2) return n;
int t = fib(n-1);
return t + fib(n-2);
}
Now compiler explorer will highlight which source code line an instruction in the assembly output corresponds to with distinct colors, and you'll easily see how the two calls are made. The line return t + fib(n-2)
is compiled by GCC to
.L3:
mov eax, DWORD PTR [rsp+112] # n, %sfp
add edx, DWORD PTR [rsp+64] # D.35656, %sfp
add DWORD PTR [rsp+60], edx # %sfp, D.35656
sub DWORD PTR [rsp+104], 2 # %sfp,
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With