Python built-in sum function vs. for loop performance

Question

I noticed that Python's built-in sum function is roughly 3x faster than a for loop when summing a list of 1 000 000 integers:

import timeit

def sum1():
    s = 0
    for i in range(1000000):
        s += i
    return s

def sum2():
    return sum(range(1000000))

print 'For Loop Sum:', timeit.timeit(sum1, number=10)
print 'Built-in Sum:', timeit.timeit(sum2, number=10)

# Prints:
# For Loop Sum: 0.751425027847
# Built-in Sum: 0.266746997833

Why is that? How is sum implemented?

Martijn Pieters · Accepted Answer

The speed difference is actually greater than 3 times, but you slow down either version by first creating a huge in-memory list of 1 million integers. Separate that out of the time trials:

>>> import timeit
>>> def sum1(lst):
...     s = 0
...     for i in lst:
...         s += i
...     return s
... 
>>> def sum2(lst):
...     return sum(lst)
... 
>>> values = range(1000000)
>>> timeit.timeit('f(lst)', 'from __main__ import sum1 as f, values as lst', number=100)
3.457869052886963
>>> timeit.timeit('f(lst)', 'from __main__ import sum2 as f, values as lst', number=100)
0.6696369647979736

The speed difference has risen to over 5 times now.

A for loop is executed as interpreted Python bytecode. sum() loops entirely in C code. The speed difference between interpreted bytecode and C code is large.

In addition, the C code makes sure not to create new Python objects if it can keep the sum in C types instead; this works for int and float results.

The Python version, disassembled, does this:

>>> import dis
>>> def sum1():
...     s = 0
...     for i in range(1000000):
...         s += i
...     return s
... 
>>> dis.dis(sum1)
  2           0 LOAD_CONST               1 (0)
              3 STORE_FAST               0 (s)

  3           6 SETUP_LOOP              30 (to 39)
              9 LOAD_GLOBAL              0 (range)
             12 LOAD_CONST               2 (1000000)
             15 CALL_FUNCTION            1
             18 GET_ITER            
        >>   19 FOR_ITER                16 (to 38)
             22 STORE_FAST               1 (i)

  4          25 LOAD_FAST                0 (s)
             28 LOAD_FAST                1 (i)
             31 INPLACE_ADD         
             32 STORE_FAST               0 (s)
             35 JUMP_ABSOLUTE           19
        >>   38 POP_BLOCK           

  5     >>   39 LOAD_FAST                0 (s)
             42 RETURN_VALUE

Apart from the interpreter loop being slower than C, the INPLACE_ADD will create a new integer object (past 255, CPython caches small int objects as singletons).

You can see the C implementation in the Python mercurial code repository, but it explicitly states in the comments:

/* Fast addition by keeping temporary sums in C instead of new Python objects.
   Assumes all inputs are the same type.  If the assumption fails, default
   to the more general routine.
*/

DrV · Answer

As dwanderson suggested, Numpy is one alternative. It is, indeed, if you want to do some maths. See this benchmark:

import numpy as np

r = range(1000000)       # 12.5 ms
s = sum(r)               # 7.9 ms

ar = np.arange(1000000)  # 0.5 ms
as = np.sum(ar)          # 0.6 ms

So both creating the list and summing it is much faster with numpy. This is mostly because the numpy.array is designed for this and is much more efficient than the list.

However, if we have a python list, then numpy is very slow, as its conversion from a list into a numpy.array is sluggish:

r = range(1000000)
ar = np.array(r)         # 102 ms

Python built-in sum function vs. for loop performance

Tags:

performance

python

sum

Michal

2 Answers

Martijn Pieters

DrV

Recent Activity

Donate For Us

Python built-in sum function vs. for loop performance

Tags:

performance

python

sum

Michal

2 Answers

Martijn Pieters

DrV

Related questions

Recent Activity

Donate For Us