Why is numba faster than numpy here?

Tags:

I can't figure out why numba is beating numpy here (over 3x). Did I make some fundamental error in how I am benchmarking here? Seems like the perfect situation for numpy, no? Note that as a check, I also ran a variation combining numba and numpy (not shown), which as expected was the same as running numpy without numba.

(btw this is a followup question to: Fastest way to numerically process 2d-array: dataframe vs series vs array vs numba )

import numpy as np
from numba import jit
nobs = 10000 

def proc_numpy(x,y,z):

   x = x*2 - ( y * 55 )      # these 4 lines represent use cases
   y = x + y*2               # where the processing time is mostly
   z = x + y + 99            # a function of, say, 50 to 200 lines
   z = z * ( z - .88 )       # of fairly simple numerical operations

   return z

@jit
def proc_numba(xx,yy,zz):
   for j in range(nobs):     # as pointed out by Llopis, this for loop 
      x, y = xx[j], yy[j]    # is not needed here.  it is here by 
                             # accident because in the original benchmarks 
      x = x*2 - ( y * 55 )   # I was doing data creation inside the function 
      y = x + y*2            # instead of passing it in as an array
      z = x + y + 99         # in any case, this redundant code seems to 
      z = z * ( z - .88 )    # have something to do with the code running
                             # faster.  without the redundant code, the 
      zz[j] = z              # numba and numpy functions are exactly the same.
   return zz

x = np.random.randn(nobs)
y = np.random.randn(nobs)
z = np.zeros(nobs)
res_numpy = proc_numpy(x,y,z)

z = np.zeros(nobs)
res_numba = proc_numba(x,y,z)

results:

In [356]: np.all( res_numpy == res_numba )
Out[356]: True

In [357]: %timeit proc_numpy(x,y,z)
10000 loops, best of 3: 105 µs per loop

In [358]: %timeit proc_numba(x,y,z)
10000 loops, best of 3: 28.6 µs per loop

I ran this on a 2012 macbook air (13.3), standard anaconda distribution. I can provide more detail on my setup if it's relevant.

340

asked Sep 20 '14 16:09

JohnE

3 Answers

I think this question highlights (somewhat) the limitations of calling out to precompiled functions from a higher level language. Suppose in C++ you write something like:

for (int i = 0; i != N; ++i) a[i] = b[i] + c[i] + 2 * d[i];

The compiler sees all this at compile time, the whole expression. It can do a lot of really intelligent things here, including optimizing out temporaries (and loop unrolling).

In python however, consider what's happening: when you use numpy each ''+'' uses operator overloading on the np array types (which are just thin wrappers around contiguous blocks of memory, i.e. arrays in the low level sense), and calls out to a fortran (or C++) function which does the addition super fast. But it just does one addition, and spits out a temporary.

We can see that in some way, while numpy is awesome and convenient and pretty fast, it is slowing things down because while it seems like it is calling into a fast compiled language for the hard work, the compiler doesn't get to see the whole program, it's just fed isolated little bits. And this is hugely detrimental to a compiler, especially modern compilers which are very intelligent and can retire multiple instructions per cycle when the code is well written.

Numba on the other hand, used a jit. So, at runtime it can figure out that the temporaries are not needed, and optimize them away. Basically, Numba has a chance to have the program compiled as a whole, numpy can only call small atomic blocks which themselves have been pre-compiled.

186

answered Oct 12 '22 14:10

Nir Friedman

When you ask numpy to do:

x = x*2 - ( y * 55 )

It is internally translated to something like:

tmp1 = y * 55
tmp2 = x * 2
tmp3 = tmp2 - tmp1
x = tmp3

Each of those temps are arrays that have to be allocated, operated on, and then deallocated. Numba, on the other hand, handles things one item at a time, and doesn't have to deal with that overhead.

answered Oct 12 '22 14:10

Jaime

Numba is generally faster than Numpy and even Cython (at least on Linux).

Here's a plot (stolen from Numba vs. Cython: Take 2): Benchmark on Numpy, Cython and Numba

In this benchmark, pairwise distances have been computed, so this may depend on the algorithm.

Note that this may be different on other Platforms, see this for Winpython (From WinPython Cython tutorial):

Benchmark on Numpy, Cython and Numba with Winpython

answered Oct 12 '22 16:10

sebix

Related questions
                            
                                How come list element lookup is O(1) in Python?
                            
                                How to change metadata with ffmpeg/avconv without creating a new file?
                            
                                delete all keys except one in dictionary
                            
                                Python: Read and write TIFF 16 bit , three channel , colour images
                            
                                Basemap import error in PyCharm — KeyError: 'PROJ_LIB'
                            
                                Is there a data type in Python similar to structs in C++?
                            
                                Get coordinates of local maxima in 2D array above certain value
                            
                                Find the smallest power of 2 greater than or equal to n in Python
                            
                                query from postgresql using python as dictionary
                            
                                How to disable ConvergenceWarning using sklearn?
                            
                                Is there a map without result in python?
                            
                                How to parse BaseHTTPRequestHandler.path
                            
                                Python: Adding 3 weeks to any date
                            
                                How to make a field conditionally optional in WTForms?
                            
                                Forming Bigrams of words in list of sentences with Python
                            
                                "dot.exe" not found in path. Pydot on Python (Windows 7)
                            
                                How to solve memory issues while multiprocessing using Pool.map()?
                            
                                multiple actions in list comprehension python
                            
                                Turning off logging in Selenium (from Python)
                            
                                flask-cache memoize URL query string parameters as well

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is numba faster than numpy here?

Tags:

python

numpy

numba