Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is numba faster than numpy here?

I can't figure out why numba is beating numpy here (over 3x). Did I make some fundamental error in how I am benchmarking here? Seems like the perfect situation for numpy, no? Note that as a check, I also ran a variation combining numba and numpy (not shown), which as expected was the same as running numpy without numba.

(btw this is a followup question to: Fastest way to numerically process 2d-array: dataframe vs series vs array vs numba )

import numpy as np
from numba import jit
nobs = 10000 

def proc_numpy(x,y,z):

   x = x*2 - ( y * 55 )      # these 4 lines represent use cases
   y = x + y*2               # where the processing time is mostly
   z = x + y + 99            # a function of, say, 50 to 200 lines
   z = z * ( z - .88 )       # of fairly simple numerical operations

   return z

@jit
def proc_numba(xx,yy,zz):
   for j in range(nobs):     # as pointed out by Llopis, this for loop 
      x, y = xx[j], yy[j]    # is not needed here.  it is here by 
                             # accident because in the original benchmarks 
      x = x*2 - ( y * 55 )   # I was doing data creation inside the function 
      y = x + y*2            # instead of passing it in as an array
      z = x + y + 99         # in any case, this redundant code seems to 
      z = z * ( z - .88 )    # have something to do with the code running
                             # faster.  without the redundant code, the 
      zz[j] = z              # numba and numpy functions are exactly the same.
   return zz

x = np.random.randn(nobs)
y = np.random.randn(nobs)
z = np.zeros(nobs)
res_numpy = proc_numpy(x,y,z)

z = np.zeros(nobs)
res_numba = proc_numba(x,y,z)

results:

In [356]: np.all( res_numpy == res_numba )
Out[356]: True

In [357]: %timeit proc_numpy(x,y,z)
10000 loops, best of 3: 105 µs per loop

In [358]: %timeit proc_numba(x,y,z)
10000 loops, best of 3: 28.6 µs per loop

I ran this on a 2012 macbook air (13.3), standard anaconda distribution. I can provide more detail on my setup if it's relevant.

like image 340
JohnE Avatar asked Sep 20 '14 16:09

JohnE


People also ask

What is faster than NumPy?

pandas provides a bunch of C or Cython optimized functions that can be faster than the NumPy equivalent function (e.g. reading text from text files). If you want to do mathematical operations like a dot product, calculating mean, and some more, pandas DataFrames are generally going to be slower than a NumPy array.

Why is Numba faster than Python?

the name “Numba” comes from “NumPy” + “Mamba” which is a clever wordplay because Mamba snakes are known to crunch their prey quicker and generally faster than Pythons. The machine code generated by Numba is as fast as languages like C, C++, and Fortran without having to code in those languages.

Is Numba faster than Julia?

Numba is 10X faster than pure Python for the micro-benchmark of a simple quadrature rule. However, Julia is still more than 3X faster than Numba, in part due to SIMD optimizations enabled by LoopVectorization. jl.


3 Answers

I think this question highlights (somewhat) the limitations of calling out to precompiled functions from a higher level language. Suppose in C++ you write something like:

for (int i = 0; i != N; ++i) a[i] = b[i] + c[i] + 2 * d[i];

The compiler sees all this at compile time, the whole expression. It can do a lot of really intelligent things here, including optimizing out temporaries (and loop unrolling).

In python however, consider what's happening: when you use numpy each ''+'' uses operator overloading on the np array types (which are just thin wrappers around contiguous blocks of memory, i.e. arrays in the low level sense), and calls out to a fortran (or C++) function which does the addition super fast. But it just does one addition, and spits out a temporary.

We can see that in some way, while numpy is awesome and convenient and pretty fast, it is slowing things down because while it seems like it is calling into a fast compiled language for the hard work, the compiler doesn't get to see the whole program, it's just fed isolated little bits. And this is hugely detrimental to a compiler, especially modern compilers which are very intelligent and can retire multiple instructions per cycle when the code is well written.

Numba on the other hand, used a jit. So, at runtime it can figure out that the temporaries are not needed, and optimize them away. Basically, Numba has a chance to have the program compiled as a whole, numpy can only call small atomic blocks which themselves have been pre-compiled.

like image 186
Nir Friedman Avatar answered Oct 12 '22 14:10

Nir Friedman


When you ask numpy to do:

x = x*2 - ( y * 55 )

It is internally translated to something like:

tmp1 = y * 55
tmp2 = x * 2
tmp3 = tmp2 - tmp1
x = tmp3

Each of those temps are arrays that have to be allocated, operated on, and then deallocated. Numba, on the other hand, handles things one item at a time, and doesn't have to deal with that overhead.

like image 27
Jaime Avatar answered Oct 12 '22 14:10

Jaime


Numba is generally faster than Numpy and even Cython (at least on Linux).

Here's a plot (stolen from Numba vs. Cython: Take 2): Benchmark on Numpy, Cython and Numba

In this benchmark, pairwise distances have been computed, so this may depend on the algorithm.

Note that this may be different on other Platforms, see this for Winpython (From WinPython Cython tutorial):

Benchmark on Numpy, Cython and Numba with Winpython

like image 9
sebix Avatar answered Oct 12 '22 16:10

sebix