Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this iterative Collatz method 30% slower than its recursive version in Python?

Prelude

I have two implementations for a particular problem, one recursive and one iterative, and I want to know what causes the iterative solution to be ~30% slower than the recursive one.

Given the recursive solution, I write an iterative solution making the stack explicit.
Clearly, I simply mimic what the recursion is doing, so of course the Python engine is better optimized to handle the bookkeeping. But can we write an iterative method with similar performance?

My case study is Problem #14 on Project Euler.

Find the longest Collatz chain with a starting number below one million.

Code

Here is a parsimonious recursive solution (credit due to veritas in the problem thread plus an optimization from jJjjJ):

def solve_PE14_recursive(ub=10**6):
    def collatz_r(n):
        if not n in table:
            if n % 2 == 0:
                table[n] = collatz_r(n // 2) + 1
            elif n % 4 == 3:
                table[n] = collatz_r((3 * n + 1) // 2) + 2
            else:
                table[n] = collatz_r((3 * n + 1) // 4) + 3
        return table[n]
    table = {1: 1}
    return max(xrange(ub // 2 + 1, ub, 2), key=collatz_r)

Here's my iterative version:

def solve_PE14_iterative(ub=10**6):
    def collatz_i(n):
        stack = []
        while not n in table:
            if n % 2 == 0:
                x, y = n // 2, 1
            elif n % 4 == 3:
                x, y = (3 * n + 1) // 2, 2
            else:
                x, y = (3 * n + 1) // 4, 3
            stack.append((n, y))
            n = x
        ysum = table[n]
        for x, y in reversed(stack):
            ysum += y
            table[x] = ysum
        return ysum
    table = {1: 1}
    return max(xrange(ub // 2 + 1, ub, 2), key=collatz_i)

And the timings on my machine (i7 machine with lots of memory) using IPython:

In [3]: %timeit solve_PE14_recursive()
1 loops, best of 3: 942 ms per loop
In [4]: %timeit solve_PE14_iterative()
1 loops, best of 3: 1.35 s per loop

Comments

The recursive solution is awesome:

  • Optimized to skip a step or two depending on the two least significant bits.
    My original solution didn't skip any Collatz steps and took ~1.86 s
  • It is difficult to hit Python's default recursion limit of 1000.
    collatz_r(9780657630) returns 1133 but requires less than 1000 recursive calls.
  • Memoization avoids retracing
  • collatz_r length calculated on-demand for max

Playing around with it, timings seem to be precise to +/- 5 ms.
Languages with static typing like C and Haskell can get timings below 100 ms.
I put the initialization of the memoization table in the method by design for this question, so that timings would reflect the "re-discovery" of the table values on each invocation.

collatz_r(2**1002) raises RuntimeError: maximum recursion depth exceeded.
collatz_i(2**1002) happily returns with 1003.

I am familiar with generators, coroutines, and decorators.
I am using Python 2.7. I am also happy to use Numpy (1.8 on my machine).

What I am looking for

  • an iterative solution that closes the performance gap
  • discussion on how Python handles recursion
  • the finer details of the performance penalties associated with an explicit stack

I'm looking mostly for the first, though the second and third are very important to this problem and would increase my understanding of Python.

like image 876
Prashant Kumar Avatar asked Feb 11 '14 07:02

Prashant Kumar


People also ask

Why are recursive function considered slower than the iterative counterparts?

In Recursion format, a stack is used to store the set of new local variables and parameters each time the function is called. Since recursion makes use of the stack data structure and due to this overhead, it is slower than the iteration code format.

Is recursion faster than iteration?

Recursion has a large amount of overhead as compared to Iteration. It is usually much slower because all function calls must be stored in a stack to allow the return back to the caller functions.

Why is recursion worse than iteration?

Recursion terminates when a base case is recognized. Recursion is usually slower than iteration due to the overhead of maintaining the stack. Recursion uses more memory than iteration. Recursion makes the code smaller.

Why is an iterative looping function considered to be more efficient than an equivalent recursive function?

An Iterative algorithm will be faster than the Recursive algorithm because of overheads like calling functions and registering stacks repeatedly. Many times the recursive algorithms are not efficient as they take more space and time.


2 Answers

Here's my shot at a (partial) explanation after running some benchmarks, which confirm your figures.

While recursive function calls are expensive in CPython, they aren't nearly as expensive as emulating a call stack using lists. The stack for a recursive call is a compact structure implemented in C (see Eli Bendersky's explanation and the file Python/ceval.c in the source code).

By contrast, your emulated stack is a Python list object, i.e. a heap-allocated, dynamically growing array of pointers to tuple objects, which in turn point to the actual values; goodbye, locality of reference, hello cache misses. You then use Python's notoriously slow iteration on these objects. A line-by-line profiling with kernprof confirms that iteration and list handling are taking a lot of time:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    16                                               @profile
    17                                               def collatz_i(n):
    18    750000       339195      0.5      2.4          stack = []
    19   3702825      1996913      0.5     14.2          while not n in table:
    20   2952825      1329819      0.5      9.5              if n % 2 == 0:
    21    864633       416307      0.5      3.0                  x, y = n // 2, 1
    22   2088192       906202      0.4      6.4              elif n % 4 == 3:
    23   1043583       617536      0.6      4.4                  x, y = (3 * n + 1) // 2, 2
    24                                                       else:
    25   1044609       601008      0.6      4.3                  x, y = (3 * n + 1) // 4, 3
    26   2952825      1543300      0.5     11.0              stack.append((n, y))
    27   2952825      1150867      0.4      8.2              n = x
    28    750000       352395      0.5      2.5          ysum = table[n]
    29   3702825      1693252      0.5     12.0          for x, y in reversed(stack):
    30   2952825      1254553      0.4      8.9              ysum += y
    31   2952825      1560177      0.5     11.1              table[x] = ysum
    32    750000       305911      0.4      2.2          return ysum

Interestingly, even n = x takes around 8% of the total running time.

(Unfortunately, I couldn't get kernprof to produce something similar for the recursive version.)

like image 92
Fred Foo Avatar answered Sep 28 '22 13:09

Fred Foo


Iterative code is sometimes faster than recursive because it avoids function call overhead. However, stack.append is also a function call (and an attribute lookup on top) and adds similar overhead. Counting the append calls, the iterative version makes just as many function calls as the recursive version.

Comparing the first two and the last two timings here...

$ python -m timeit pass
10000000 loops, best of 3: 0.0242 usec per loop
$ python -m timeit -s "def f(n): pass" "f(1)"
10000000 loops, best of 3: 0.188 usec per loop
$ python -m timeit -s "def f(n): x=[]" "f(1)"
1000000 loops, best of 3: 0.234 usec per loop
$ python -m timeit -s "def f(n): x=[]; x.append" "f(1)"
1000000 loops, best of 3: 0.336 usec per loop
$ python -m timeit -s "def f(n): x=[]; x.append(1)" "f(1)"
1000000 loops, best of 3: 0.499 usec per loop

...confirms that the append call excluding attribute lookup takes approximately the same time as calling a minimal pure Python function, ~170 ns.


From the above I conclude that the iterative version does not enjoy an inherent advantage. The next question to consider is which one does more work. To get a (very) rough estimate, we can look at the number of lines executed in each version. I did a quick experiment to find out that:

  • collatz_r is called 1234275 times, and the body of the if block executes 984275 times.
  • collatz_i is called 250000 times, and the while loop runs 984275 times

Now, let's say collatz_r has 2 lines outside the if and 4 lines inside (that are executed in the worst case, when the else is hit). That adds up to 6.4 million lines to execute. Comparable figures for collatz_i could be 5 and 9, which add up to 10.0 million.

Even if that was just a rough estimate, it is well enough in line with the actual timings.

like image 2
Janne Karila Avatar answered Sep 30 '22 13:09

Janne Karila