Why is one of these dynamic programming implementations of the Fibonacci sequence faster than the other?

I've been researching and benchmarking various Fibonacci algorithms recently for my own amusement and more or less by accident came up with an alternate implementation of the classic O(n) time and O(1) space dynamic programming implementation.

Consider the following two functions:

BigInt fib_dp_classic(int n) {
  if (n == 0) {
    return 0;
  }

  BigInt x = 0, y = 1, z;
  for (int i = 2; i <= n; ++i) {
    z = x + y;
    x = y;
    y = z;
  }

  return y;
}

and

BigInt fib_dp_mod(int n) {
  BigInt x = 0, y = 1, z = 1;
  for (int i = 0; i < n; ++i) {
    switch (i % 3) {
      case 0:
        y = x + z;
        break;
      case 1:
        z = x + y;
        break;
      case 2:
        x = y + z;
        break;
    }
  }

  switch (n % 3) {
    case 0:
      return x;
      break;
    case 1:
      return y;
      break;
    case 2:
      return z;
      break;
  }
}

On my machine, calculating the millionth Fibonacci number takes 6.55s with fib_dp_classic and 2.83 seconds with fib_dp_mod, and even turning on -O3 doesn't change this too much. I don't really have any good ideas as to why the mod version is faster. Is it because the extra store instructions in the classic version are more expensive than the mod in the second? It's my understanding that the compiler should be putting all three variables in registers in both versions and computing the mod is actually fairly expensive; is this not the case?

In fact, I just put both of these through compiler explorer and both are using only registers once you turn optimizations on. Granted, this is only using ints, not the GMP-based bigints I was actually using for my benchmark. Is there some weird GMP implementation detail that might be the cause here?

Update: I even strace'd both to see if malloc() might be the culprit and fib_dp_classic uses 130 syscalls (for n=1000000) while fib_dp_mod uses 133. So still no real clues...

Update 2: Yes, the buffer copies are the culprit (as geza pointed out) and I was dumb for not realizing. Here are two alternate versions and their benchmark results:

BigInt fib_dp_move(int n) {
  if (n == 0) {
    return 0;
  }

  BigInt x = 0, y = 1, z;
  for (int i = 2; i <= n; ++i) {
    z = std::move(x) + y;
    x = std::move(y);
    y = std::move(z);
  }

  return y;
}

This runs in 2.84 seconds, so just about equivalent to the mod version since it eliminates the unnecessary copies.

BigInt fib_dp_swap(int n) {
  if (n == 0) {
    return 0;
  }

  BigInt x = 0, y = 1, z;
  for (int i = 2; i <= n; ++i) {
    z = x + y;
    swap(x, y);
    swap(y, z);
  }

  return y;
}

This (from geza) also runs in 2.84 seconds, so again about equivalent to the mod version since it eliminates the copies in basically the same way, just calling swap() instead of using move semantics.

What is the complexity of dynamic programming solution for Fibonacci series?

The time complexity of the Fibonacci series is T(N), i.e., linear. We have to find the sum of two terms, and it is repeated n times depending on the value of n. The space complexity of the Fibonacci series using dynamic programming is O(1).

What is dynamic programming explain how Fibonacci series problem can be solved more efficiently using dynamic programming approach?

Dynamic programming is a technique to solve the recursive problems in more efficient manner. Many times in recursion we solve the sub-problems repeatedly. In dynamic programming we store the solution of these sub-problems so that we do not have to solve them again, this is called Memoization.

How do you speed up Fibonacci?

Summary: The two fast Fibonacci algorithms are matrix exponentiation and fast doubling, each having an asymptotic complexity of Θ(logn) bigint arithmetic operations. Both algorithms use multiplication, so they become even faster when Karatsuba multiplication is used.

Can Fibonacci series be solved by dynamic programming?

This is a C++ Program that Solves Fibonacci Numbers Problem using Dynamic Programming technique. The Fibonacci Sequence is the series of numbers: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, … The next number is found by adding up the two numbers before it. This problem can be solved very easily by using recursion.

Your first function uses one addition and 3 copies for BigInt -- all of the are quite time consuming operations. The second function uses one addition and one copy operations -- that's where the savings come from, you save 2 copy operations with BigInt.

In this case, there is no point comparing the GMP version with the simple int version. Fib(1000000) is a ~86KB number, it is much slower to work with than a simple int. For ints, a copy can be basically free operation in certain circumstances, while for GMP numbers, it involves a copy of a 86KB buffer.

(Note, of course, the copy won't be 86KB always. In the beginning, it is ~0KB, but as the routine progresses, it grows to 86KB. And as the numbers grow, the routine becomes slower and slower, so the majority of the time is spent when the number is big)

Assuming a quality BigInt implementation, we have these operation counts in each iteration:

fib_dp_classic: 1 add, 2 copies
fib_dp_mod: 1 add

As you can see, the classic version does 2 extra copies (note, that in a quality implementation, x=y+z doesn't involve a copy). And the speed of a copy has the same order of magnitude as the add (I mean, an add may be 2x-3x slower than a copy). So this explains the ~2.3x slowdown of the classic routine.

Note, that if BigInt would be an implementation which uses reference counts, then x=y operation could be basically a free operation, because it doesn't need a copy, just incrementing a counter (In this case, the classic routine would have the same speed as the mod one).

Last note: presumably you can speed up the classic version, if a non-copying swap operation is available:

BigInt fib_dp_swap(int n) {
  if (n == 0) {
    return 0;
  }

  BigInt x = 0, y = 1, z;
  for (int i = 2; i <= n; ++i) {
    z = x + y;
    x.swap(y); // Note the 2 swaps here
    y.swap(z);
  }

  return y;
}

With gmpxx, this routine runs in the same time as the mod version, and much simpler.

It is because a swap operation can be much cheaper than a copy. Swap just needs to swap the pointers inside BigInt, while a copy needs a ~86KB memory-copy.

Why is one of these dynamic programming implementations of the Fibonacci sequence faster than the other?

Tags:

c++

fibonacci

gmp

Mark Schott

People also ask

2 Answers

lenik

geza

Recent Activity

Donate For Us

Why is one of these dynamic programming implementations of the Fibonacci sequence faster than the other?

Tags:

c++

fibonacci

gmp

Mark Schott

People also ask

2 Answers

lenik

geza

Related questions

Recent Activity

Donate For Us