Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summing with a for loop faster than with reduce?

I wanted to see how much faster reduce was than using a for loop for simple numerical operations. Here's what I found (using the standard timeit library):

In [54]: print(setup)
from operator import add, iadd
r = range(100)

In [55]: print(stmt1)    
c = 0
for i in r:
    c+=i        

In [56]: timeit(stmt1, setup)
Out[56]: 8.948904991149902
In [58]: print(stmt3)    
reduce(add, r)    

In [59]: timeit(stmt3, setup)
Out[59]: 13.316915035247803

Looking a little more:

In [68]: timeit("1+2", setup)
Out[68]: 0.04145693778991699

In [69]: timeit("add(1,2)", setup)
Out[69]: 0.22807812690734863

What's going on here? Obviously, reduce does loop faster than for, but the function call seems to dominate. Shouldn't the reduce version run almost entirely in C? Using iadd(c,i) in the for loop version makes it run in ~24 seconds. Why would using operator.add be so much slower than +? I was under the impression + and operator.add run the same C code (I checked to make sure operator.add wasn't just calling + in python or anything).

BTW, just using sum runs in ~2.3 seconds.

In [70]: print(sys.version)
2.7.1 (r271:86882M, Nov 30 2010, 09:39:13) 
[GCC 4.0.1 (Apple Inc. build 5494)]
like image 802
Bryan Head Avatar asked Mar 25 '11 18:03

Bryan Head


1 Answers

The reduce(add, r) must invoke the add() function 100 times, so the overhead of the function calls adds up -- reduce uses PyEval_CallObject to invoke add on each iteration:

for (;;) {
    ...
    if (result == NULL)
        result = op2;
    else {
        # here it is creating a tuple to pass the previous result and the next
        # value from range(100) into func add():
        PyTuple_SetItem(args, 0, result);
        PyTuple_SetItem(args, 1, op2);
        if ((result = PyEval_CallObject(func, args)) == NULL)
            goto Fail;
    }

Updated: Response to question in comments.

When you type 1 + 2 in Python source code, the bytecode compiler performs the addition in place and replaces that expression with 3:

f1 = lambda: 1 + 2
c1 = byteplay.Code.from_code(f1.func_code)
print c1.code

1           1 LOAD_CONST           3
            2 RETURN_VALUE         

If you add two variables a + b the compiler will generate bytecode which loads the two variables and performs a BINARY_ADD, which is far faster than calling a function to perform the addition:

f2 = lambda a, b: a + b
c2 = byteplay.Code.from_code(f2.func_code)
print c2.code

1           1 LOAD_FAST            a
            2 LOAD_FAST            b
            3 BINARY_ADD           
            4 RETURN_VALUE         
like image 130
samplebias Avatar answered Oct 10 '22 22:10

samplebias