TLDR: in cython, why (or when?) is iterating over a numpy array faster than iterating over a python list?
Generally: I've used Cython before and was able to get tremendous speed ups over naive python impl', However, figuring out what exactly needs to be done seems non-trivial.
Consider the following 3 implementations of a sum() function. They reside in a cython file called 'cy' (obviously, there's np.sum(), but that's besides my point..)
Naive python:
def sum_naive(A):
s = 0
for a in A:
s += a
return s
Cython with a function that expects a python list:
def sum_list(A):
cdef unsigned long s = 0
for a in A:
s += a
return s
Cython with a function that expects a numpy array.
def sum_np(np.ndarray[np.int64_t, ndim=1] A):
cdef unsigned long s = 0
for a in A:
s += a
return s
I would expect that in terms of running time, sum_np < sum_list < sum_naive, however, the following script demonstrates to the contrary (for completeness, I added np.sum() )
N = 1000000
v_np = np.array(range(N))
v_list = range(N)
%timeit cy.sum_naive(v_list)
%timeit cy.sum_naive(v_np)
%timeit cy.sum_list(v_list)
%timeit cy.sum_np(v_np)
%timeit v_np.sum()
with results:
In [18]: %timeit cyMatching.sum_naive(v_list)
100 loops, best of 3: 18.7 ms per loop
In [19]: %timeit cyMatching.sum_naive(v_np)
1 loops, best of 3: 389 ms per loop
In [20]: %timeit cyMatching.sum_list(v_list)
10 loops, best of 3: 82.9 ms per loop
In [21]: %timeit cyMatching.sum_np(v_np)
1 loops, best of 3: 1.14 s per loop
In [22]: %timeit v_np.sum()
1000 loops, best of 3: 659 us per loop
What's going on? Why is cython+numpy slow?
P.S.
I do use
#cython: boundscheck=False
#cython: wraparound=False
NumPy Arrays Are Faster Than Lists As predicted, we can see that NumPy arrays are significantly faster than lists.
NumPy is mostly written in C. The main advantage of Python is that there are a number of ways of very easily extending your code with C (ctypes, swig,f2py) / C++ (boost. python, weave.
There is a better way to implement this in cython that at least on my machine beats np.sum
because it avoids type checking and other things that numpy normally has to do when dealing with an arbitrary array:
#cython.wraparound=False
#cython.boundscheck=False
cimport numpy as np
def sum_np(np.ndarray[np.int64_t, ndim=1] A):
cdef unsigned long s = 0
for a in A:
s += a
return s
def sum_np2(np.int64_t[::1] A):
cdef:
unsigned long s = 0
size_t k
for k in range(A.shape[0]):
s += A[k]
return s
And then the timings:
N = 1000000
v_np = np.array(range(N))
v_list = range(N)
%timeit sum(v_list)
%timeit sum_naive(v_list)
%timeit np.sum(v_np)
%timeit sum_np(v_np)
%timeit sum_np2(v_np)
10 loops, best of 3: 19.5 ms per loop
10 loops, best of 3: 64.9 ms per loop
1000 loops, best of 3: 1.62 ms per loop
1 loops, best of 3: 1.7 s per loop
1000 loops, best of 3: 1.42 ms per loop
You don't want to iterate over the numpy array via the Python style, but rather access elements using indexing as that it can be translated into pure C, rather than relying on the Python API.
a
is untyped and thus there will be lots of conversions from Python to C types and back. These can be slow.
JoshAdel correctly pointed out that instead of iterating though, you should iterate over a range. Cython will convert the indexing to C, which is fast.
Using cython -a myfile.pyx
will highlight these sorts of things for you; you want all of your loop logic to be white for maximum speed.
PS: Note that np.ndarray[np.int64_t, ndim=1]
is outdated and has been deprecated in favour of the faster and more general long[:]
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With