Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is using a third variable faster than an addition trick?

When computing fibonacci numbers, a common method is mapping the pair of numbers (a, b) to (b, a + b) multiple times. This can usually be done by defining a third variable c and doing a swap. However, I realised you could do the following, avoiding the use of a third integer variable:

b = a + b;  // b2 = a1 + b1
a = b - a;  // a2 = b2 - a1 = b1, Ta-da!

I expected this to be faster than using a third variable, since in my mind this new method should only have to consider two memory locations.

So I wrote the following C programs comparing the processes. These mimic the calculation of fibonacci numbers, but rest assured I am aware that they will not calculate the correct values due to size limitations.

(Note: I realise now that it was unnecessary to make n a long int, but I will keep it as it is because that is how I first compiled it)

File: PlusMinus.c

// Using the 'b=a+b;a=b-a;' method.
#include <stdio.h>

int main() {
    long int n = 1000000; // Number of iterations.
    long int a,b;

    a = 0; b = 1;
    while (n--) {
        b = a + b;
        a = b - a;
    }

    printf("%lu\n", a);
}

File: ThirdVar.c

// Using the third-variable method.
#include <stdio.h>

int main() {
    long int n = 1000000; // Number of iterations.
    long int a,b,c;

    a = 0; b = 1;
    while (n--) {
        c = a;
        a = b;
        b = b + c;
    }

    printf("%lu\n", a);
}

When I run the two with GCC (no optimisations enabled) I notice a consistent difference in speed:

$ time ./PlusMinus
14197223477820724411

real    0m0.014s
user    0m0.009s
sys     0m0.002s

$ time ./ThirdVar
14197223477820724411

real    0m0.012s
user    0m0.008s
sys     0m0.002s

When I run the two with GCC with -O3, the assembly outputs are equal. (I suspect I had confirmation bias when stating that one just outperformed the other in previous edits.)

Inspecting the assembly for each, I see that PlusMinus.s actually has one less instruction than ThirdVar.s, but runs consistently slower.

Question

Why does this time difference occur? Not only at all, but also why is my addition/subtraction method slower contrary to my expectations?

like image 283
AJF Avatar asked Mar 05 '23 22:03

AJF


1 Answers

Why does this time difference occur?

There is no time difference when compiled with optimizations (under recent versions of gcc and clang). For instance, gcc 8.1 for x86_64 compiles both to:

Live at Godbolt

.LC0:
        .string "%lu\n"
main:
        sub     rsp, 8
        mov     eax, 1000000
        mov     esi, 1
        mov     edx, 0
        jmp     .L2
.L3:
        mov     rsi, rcx
.L2:
        lea     rcx, [rdx+rsi]
        mov     rdx, rsi
        sub     rax, 1
        jne     .L3
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     eax, 0
        add     rsp, 8
        ret

Not only at all, but also why is my addition/subtraction method slower contrary to my expectations?

Adding and subtracting could be slower than just moving. However, in most architectures (e.g. a x86 CPU), it is basically the same (1 cycle plus the memory latency); so this does not explain it.

The real problem is, most likely, the dependencies between the data. See:

b = a + b;
a = b - a;

To compute the second line, you have to have finished computing the value of the first. If the compiler uses the expressions as they are (which is the case under -O0), that is what the CPU will see.

In your second example, however:

c = a;
a = b;
b = b + c;

You can compute both the new a and b at the same time, since they do not depend on each other. And, in a modern processor, those operations can actually be computed in parallel. Or, putting it another way, you are not "stopping" the processor by making it wait on a previous result. This is called Instruction-level parallelism.

like image 78
Acorn Avatar answered Mar 10 '23 11:03

Acorn