Fast Fibonacci computation

Tags:

I saw a comment on Google+ a few weeks ago in which someone demonstrated a straight-forward computation of Fibonacci numbers which was not based on recursion and didn't use memoization. He effectively just remembered the last 2 numbers and kept adding them. This is an O(n) algorithm, but he implemented it very cleanly. So I quickly pointed out that a quicker way is to take advantage of the fact that they can be computed as powers of [[0,1],[1,1]] matrix and it requires only a O(log(N)) computation.

The problem, of course is that this is far from optimal past a certain point. It is efficient as long as the numbers are not too large, but they grow in length at the rate of N*log(phi)/log(10), where N is the Nth Fibonacci number and phi is the golden ratio ( (1+sqrt(5))/2 ~ 1.6 ). As it turns out, log(phi)/log(10) is very close to 1/5. So Nth Fibonacci number can be expected to have roughly N/5 digits.

Matrix multiplication, heck even number multiplication, gets very slow when the numbers start to have millions or billions of digits. So the F(100,000) took about .03 seconds to compute (in Python), while F(1000,000) took roughly 5 seconds. This is hardly O(log(N)) growth. My estimate was that this method, without improvements, only optimizes the computation to be O( (log(N)) ^ (2.5) ) or so.

Computing a billionth Fibonacci number, at this rate, would be prohibitively slow (even though it would only have ~ 1,000,000,000 / 5 digits so it easily fits within 32-bit memory).

Does anyone know of an implementation or algorithm which would allow a faster computation? Perhaps something which would allow calculation of a trillionth Fibonacci number.

And just to be clear, I am not looking for an approximation. I am looking for the exact computation (to the last digit).

Edit 1: I am adding the Python code to show what I believe is O( (log N) ^ 2.5) ) algorithm.

from operator import mul as mul
from time import clock

class TwoByTwoMatrix:
    __slots__ = "rows"

    def __init__(self, m):
        self.rows = m

    def __imul__(self, other):
        self.rows = [[sum(map(mul, my_row, oth_col)) for oth_col in zip(*other.rows)] for my_row in self.rows]
        return self

    def intpow(self, i):
        i = int(i)
        result = TwoByTwoMatrix([[long(1),long(0)],[long(0),long(1)]])
        if i <= 0:
            return result
        k = 0
        while i % 2 == 0:
            k +=1
            i >>= 1
        multiplier = TwoByTwoMatrix(self.rows)
        while i > 0:
            if i & 1:
                result *= multiplier
            multiplier *= multiplier # square it
            i >>= 1
        for j in xrange(k):
            result *= result
        return result


m = TwoByTwoMatrix([[0,1],[1,1]])

t1 = clock()
print len(str(m.intpow(100000).rows[1][1]))
t2 = clock()
print t2 - t1

t1 = clock()
print len(str(m.intpow(1000000).rows[1][1]))
t2 = clock()
print t2 - t1

Edit 2: It looks like I didn't account for the fact that len(str(...)) would make a significant contribution to the overall runtime of the test. Changing tests to

from math import log as log

t1 = clock()
print log(m.intpow(100000).rows[1][1])/log(10)
t2 = clock()
print t2 - t1

t1 = clock()
print log(m.intpow(1000000).rows[1][1])/log(10)
t2 = clock()
print t2 - t1

shortened the runtimes to .008 seconds and .31 seconds (from .03 seconds and 5 seconds when len(str(...)) were used).

Because M=[[0,1],[1,1]] raised to power N is [[F(N-2), F(N-1)], [F(N-1), F(N)]], the other obvious source of inefficiency was calculating (0,1) and (1,0) elements of the matrix as if they were distinct. This (and I switched to Python3, but Python2.7 times are similar):

class SymTwoByTwoMatrix():
    # elments (0,0), (0,1), (1,1) of a symmetric 2x2 matrix are a, b, c.
    # b is also the (1,0) element because the matrix is symmetric

    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

    def __imul__(self, other):
        # this multiplication does work correctly because we 
        # are multiplying powers of the same symmetric matrix
        self.a, self.b, self.c = \
            self.a * other.a + self.b * other.b, \
            self.a * other.b + self.b * other.c, \
            self.b * other.b + self.c * other.c
        return self

    def intpow(self, i):
        i = int(i)
        result = SymTwoByTwoMatrix(1, 0, 1)
        if i <= 0:
            return result
        k = 0
        while i % 2 == 0:
            k +=1
            i >>= 1
        multiplier = SymTwoByTwoMatrix(self.a, self.b, self.c)
        while i > 0:
            if i & 1:
                result *= multiplier
            multiplier *= multiplier # square it
            i >>= 1
        for j in range(k):
            result *= result
        return result

calculated F(100,000) in .006, F(1,000,000) in .235 and F(10,000,000) in 9.51 seconds.

Which is to be expected. It is producing results 45% faster for the fastest test and it is expected that the gain should asymptotically approach phi/(1+2*phi+phi*phi) ~ 23.6%.

The (0,0) element of M^N is actually N-2nd Fibonacci number:

for i in range(15):
    x = m.intpow(i)
    print([x.a,x.b,x.c])

gives

[1, 0, 1]
[0, 1, 1]
[1, 1, 2]
[1, 2, 3]
[2, 3, 5]
[3, 5, 8]
[5, 8, 13]
[8, 13, 21]
[13, 21, 34]
[21, 34, 55]
[34, 55, 89]
[55, 89, 144]
[89, 144, 233]
[144, 233, 377]
[233, 377, 610]

I would expect that not having to calculate element (0,0) would produce a speed up of additional 1/(1+phi+phi*phi) ~ 19%. But the lru_cache of F(2N) and F(2N-1) solution given by Eli Korvigo below actually gives a speed up of 4 times (ie, 75%). So, while I have not worked out a formal explanation, I am tempted to think that it caches the spans of 1's within the binary expansion of N and does the minimum number of multiplications necessary. Which obviates the need to find those ranges, precompute them and then multiply them at the right point in the expansion of N. lru_cache allows for a top-to-bottom computation of what would have been a more complicated buttom-to-top computation.

Both SymTwoByTwoMatrix and the lru_cache-of-F(2N)-and-F(2N-1) are taking roughly 40 times longer to compute every time N grows 10 times. I think that's possibly due to Python's implementation of multiplication of long ints. I think the multiplication of large numbers and their addition should be parallelizable. So a multi-threaded sub-O(N) solution should be possible even though (as Daniel Fisher states in comments) the F(N) solution is Theta(n).

266

asked Jul 18 '16 20:07

Dmitry Rubanovich

1 Answers

Since Fibonacci sequence is a linear recurrence, its members can be evaluated in closed form. This involves computing a power, which can be done in O(logn) similarly to the matrix-multiplication solution, but the constant overhead should be lower. That's the fastest algorithm I know.

fib

EDIT

Sorry, I missed the "exact" part. Another exact O(log(n)) alternative for the matrix-multiplication can be calculated as follows

fib2

from functools import lru_cache

@lru_cache(None)
def fib(n):
    if n in (0, 1):
        return 1
    if n & 1:  # if n is odd, it's faster than checking with modulo
        return fib((n+1)//2 - 1) * (2*fib((n+1)//2) - fib((n+1)//2 - 1))
    a, b = fib(n//2 - 1), fib(n//2)
    return a**2 + b**2

This is based on the derivation from a note by Prof. Edsger Dijkstra. The solution exploits the fact that to calculate both F(2N) and F(2N-1) you only need to know F(N) and F(N-1). Nevertheless, you are still dealing with long-number arithmetics, though the overhead should be smaller than that of the matrix-based solution. In Python you'd better rewrite this in imperative style due to the slow memoization and recursion, though I wrote it this way for the clarity of the functional formulation.

answered Oct 11 '22 08:10

Eli Korvigo

Related questions
                            
                                How to quit an iPython notebook debug session?
                            
                                Environment variable not accessible with Python with sudo [duplicate]
                            
                                Know if + or __add__ called on an object
                            
                                a Panel regression in Python
                            
                                Django REST Framework file upload causing an "Unsupported media type 'multipart/form-data'" error
                            
                                PySpark -- Convert List of Rows to Data Frame
                            
                                Accessing returned values from a function, by another function
                            
                                Interrupt all asyncio.sleep currently executing
                            
                                Using Xpath in Scrapy to select any text below paragraph
                            
                                Django DateTimeField ValidationError: value has an invalid format
                            
                                Finding hour of daily max using Pandas in Python
                            
                                Use GitPython to Checkout a new branch and push to remote
                            
                                pip install using proxy in a virtual environment
                            
                                Big Integer in JSON getting corrupted by Angular but not by CURL?
                            
                                is global counter thread safe in python?
                            
                                PyQt5 app exits on error where PyQt4 app would not
                            
                                Vectorization: Not a valid collection
                            
                                different validation in drf serializer per request method
                            
                                Is it possible to get output of pydot graph without intermediate file?
                            
                                Catching Flask abort status code in tests?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fast Fibonacci computation

Tags:

performance

python

algorithm

fibonacci

Dmitry Rubanovich

People also ask

1 Answers

Eli Korvigo

Recent Activity

Donate For Us