Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is x**3 slower than x*x*x? [duplicate]

Tags:

python

numpy

In NumPy, x*x*x is an order of magnitude faster than x**3 or even np.power(x, 3).

x = np.random.rand(1e6) %timeit x**3 100 loops, best of 3: 7.07 ms per loop  %timeit x*x*x 10000 loops, best of 3: 163 µs per loop  %timeit np.power(x, 3) 100 loops, best of 3: 7.15 ms per loop 

Any ideas as to why this behavior happens? As far as I can tell all three yield the same output (checked with np.allclose).

like image 683
Ryan G. Avatar asked Aug 26 '13 22:08

Ryan G.


People also ask

Why is Python 3 slower than Python 2?

The difference is in the implementation of the int type. Python 3. x uses the arbitrary-sized integer type ( long in 2. x) exclusively, while in Python 2.

Is is faster than == in Python?

Not all strings are automatically interned in python. But, we can force strings to be interned by using the sys. intern(). So, Use 'is' (identity) operator than '==' (equality) operator to speed up your heavy string comparison operations.


2 Answers

As per this answer, it's because the implementation of exponentiation has some overhead that multiplication does not. However, naive multiplication will get slower and slower as the exponent increases. An empirical demonstration:

 In [3]: x = np.random.rand(1e6)   In [15]: %timeit x**2  100 loops, best of 3: 11.9 ms per loop   In [16]: %timeit x*x  100 loops, best of 3: 12.7 ms per loop   In [17]: %timeit x**3  10 loops, best of 3: 132 ms per loop   In [18]: %timeit x*x*x  10 loops, best of 3: 27.2 ms per loop   In [19]: %timeit x**4  10 loops, best of 3: 132 ms per loop   In [20]: %timeit x*x*x*x  10 loops, best of 3: 42.4 ms per loop   In [21]: %timeit x**10  10 loops, best of 3: 132 ms per loop   In [22]: %timeit x*x*x*x*x*x*x*x*x*x  10 loops, best of 3: 137 ms per loop   In [24]: %timeit x**15  10 loops, best of 3: 132 ms per loop   In [25]: %timeit x*x*x*x*x*x*x*x*x*x*x*x*x*x*x  1 loops, best of 3: 212 ms per loop 

Note the exponentiation time stays more or less constant, except for the x**2 case which I suspect is special-cased, while multiplication gets slower and slower. It seems you could exploit this to get faster integer exponentiation... for example:

In [26]: %timeit x**16 10 loops, best of 3: 132 ms per loop  In [27]: %timeit x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x 1 loops, best of 3: 225 ms per loop  In [28]: def tosixteenth(x):    ....:     x2 = x*x    ....:     x4 = x2*x2    ....:     x8 = x4*x4    ....:     x16 = x8*x8    ....:     return x16    ....:  In [29]: %timeit tosixteenth(x) 10 loops, best of 3: 49.5 ms per loop 

It seems you could apply this technique generically by splitting any integer into a sum of the powers of two, computing each power of two as above, and summing:

In [93]: %paste def smartintexp(x, exp):     result = np.ones(len(x))     curexp = np.array(x)     while True:         if exp%2 == 1:             result *= curexp         exp >>= 1         if not exp: break         curexp *= curexp     return result ## -- End pasted text --  In [94]: x Out[94]: array([ 0.0163407 ,  0.57694587,  0.47336487, ...,  0.70255032,         0.62043303,  0.0796748 ])  In [99]: x**21 Out[99]: array([  3.01080670e-38,   9.63466181e-06,   1.51048544e-07, ...,          6.02873388e-04,   4.43193256e-05,   8.46721060e-24])  In [100]: smartintexp(x, 21) Out[100]: array([  3.01080670e-38,   9.63466181e-06,   1.51048544e-07, ...,          6.02873388e-04,   4.43193256e-05,   8.46721060e-24])  In [101]: %timeit x**21 10 loops, best of 3: 132 ms per loop  In [102]: %timeit smartintexp(x, 21) 10 loops, best of 3: 70.7 ms per loop 

It's fast for small even powers of two:

In [106]: %timeit x**32 10 loops, best of 3: 131 ms per loop  In [107]: %timeit smartintexp(x, 32) 10 loops, best of 3: 57.4 ms per loop 

But gets slower as the exponent gets larger:

In [97]: %timeit x**63 10 loops, best of 3: 133 ms per loop  In [98]: %timeit smartintexp(x, 63) 10 loops, best of 3: 110 ms per loop 

And not faster for large worst-cases:

In [115]: %timeit x**511 10 loops, best of 3: 135 ms per loop  In [114]: %timeit smartintexp(x, 511) 10 loops, best of 3: 192 ms per loop 
like image 169
Claudiu Avatar answered Sep 22 '22 21:09

Claudiu


As a note if you are calculating powers and are worried about speed:

x = np.random.rand(5e7)  %timeit x*x*x 1 loops, best of 3: 522 ms per loop  %timeit np.einsum('i,i,i->i',x,x,x) 1 loops, best of 3: 288 ms per loop 

Why einsum is faster is still an open question of mine. Although its like due to einsum able to use SSE2 while numpy's ufuncs will not until 1.8.

In place is even faster:

def calc_power(arr):     for x in xrange(arr.shape[0]):         arr[x]=arr[x]*arr[x]*arr[x] numba_power = autojit(calc_power)  %timeit numba_power(x) 10 loops, best of 3: 51.5 ms per loop  %timeit np.einsum('i,i,i->i',x,x,x,out=x) 10 loops, best of 3: 111 ms per loop  %timeit np.power(x,3,out=x) 1 loops, best of 3: 609 ms per loop 
like image 21
Daniel Avatar answered Sep 20 '22 21:09

Daniel