Why is the function fib2 much faster than the functions fib and fib3?

Question

The following C code, why is the function fib2 much faster than the functions fib and fib3?

compile: gcc -O3 main.c -o main

gcc version:

gcc (GCC) 10.2.1 20200825 (Alibaba 10.2.1-3.6 2.32)
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE

output:

102334155   fib Time spent: 0.185822 seconds
1059467309  fib2 Time spent: 0.036215 seconds  // why faster?
102334155   fib3 Time spent: 0.181988 seconds

Code:

#include <stdio.h>
#include <time.h>

long long fib2(int n)
{
    if (n <= 1)
    {
        return n;
    }
    else
    {
        long long first = fib2(n - 1);
        long long second = fib2(n - 2);
        return first * first + second * second;
    }
}

int fib(int n)
{
    if (n <= 1)
    {
        return n;
    }
    else
    {
        return fib(n - 1) + fib(n - 2);
    }
}

long long fib3(int n)
{
    if (n <= 1)
    {
        return n;
    }
    else
    {
        long long first = fib3(n - 1);
        long long second = fib3(n - 2);
        return first + second;
    }
}

void main()
{
    clock_t start1 = clock();
    printf("%d", fib(40));
    clock_t end1 = clock();
    double time_spent1 = (double)(end1 - start1) / CLOCKS_PER_SEC;
    printf("fib Time spent: %f seconds
", time_spent1);
    printf("%d", fib2(40));
    clock_t end2 = clock();
    double time_spent2 = (double)(end2 - end1) / CLOCKS_PER_SEC;
    printf("fib2 Time spent: %f seconds
", time_spent2);
    printf("%d", fib3(40));
    clock_t end3 = clock();
    double time_spent3 = (double)(end3 - end2) / CLOCKS_PER_SEC;
    printf("fib3 Time spent: %f seconds
", time_spent3);
}

Eric Postpischil · Accepted Answer

Analysis of the generated assembly code shows the compiler partially expands the recursive calls in fib2, and this reduces the number of recursive calls by a factor over 800 for the n = 40 case.

For n above 1, the C code nominally computes fib2(n) by calling fib2(n-1) and fib2(n-2). The generated code handles n up to 3 with fixed return values, after which it computes fib2(n) from call for fib2(n-3), fib2(n-4), fib2(n-5), and fib2(n-6). (This is seen by manually tracing the generated code.) While this is four calls from one fib2(n) call, it skips down two levels of recursion. Below is code that tells us about the effect of that by counting the numbers of recursive calls made:

#include <stdio.h>


//Count calls made by original recursion.
long long fibx(int n)
{
    if (n <= 1)
        return 1;
    else
        return 1 + fibx(n-1) + fibx(n-2);
}


// Count calls made by optimized code.
long long fiby(int n)
{
    if (n <= 3)
        return 1;
    else
        return 1 + fiby(n-3) + fiby(n-4) + fiby(n-5) + fiby(n-6);
}


int main(void)
{
    printf("%lld
", fibx(40));
    printf("%lld
", fiby(40));
}

This shows 331,160,281 calls for the original code and 386,393 for the optimized code! While the optimized code has additional arithmetic per call (to combine the results calculated from the various calls), it clearly eliminates a great deal of repetitive work.

This explains why fib2 is so fast. It does not address why fib and fib3 remain slow. I have not analyzed their code yet, although, interestingly, each contains only one recursive call—but it is in a loop, so the routine executes some number of recursive calls. That seems like it would be interesting to investigate.

Also, I only analyzed the code generated for the external entry point to fib2. The compiler also generates a special fib2.part.0 version of the routine that is called by main. It does some things of its own before (sometimes) calling fib2. Further, the compiler inlined one recursive call into main; it does not call fib2.part.0(40) but rather calls fib2.part.0(39) and fib2.part.0(38) and calculates the sum of their squares.

In any case, it seems clear the performance difference is due to optimization choices made by the compiler.

Hoàng Lê Ngọc · Answer

The comparison between these three processing functions was unfair from the beginning. For example, when I tested with the 40th Fibonacci number, the results were completely reversed (fib2 had a longer processing time).

102334155  fib Time spent: 0.401000 seconds
1059467309 fib2 Time spent: 0.815000 seconds
102334155  fib3 Time spent: 0.423000 seconds

On the other hand, in my opinion, calculations on larger numbers will cause longer processing times:

Between the fib and fib3 functions, the calculation time will be the same if the result is a small Fibonacci number within the int range. However, if the number is larger, fib will only produce an int result, while fib3 returns a long long type, which requires more computational work.
It is impossible to compare fib2 with fib and fib3 because their recursive formulas are different; fib2 always uses more calculations. It might be because the result after each calculation of fib2, due to frequent overflows, is much smaller, so the processing time seems faster, but this is, of course, just an illusion.

Why is the function fib2 much faster than the functions fib and fib3?

Tags:

c

zmhuang

2 Answers

Eric Postpischil

Hoàng Lê Ngọc

Recent Activity

Donate For Us

Why is the function fib2 much faster than the functions fib and fib3?

Tags:

c

zmhuang

2 Answers

Eric Postpischil

Hoàng Lê Ngọc

Related questions

Recent Activity

Donate For Us