TLDR: in cython, why (or when?) is iterating over a numpy array faster than iterating over a python list? Generally: I've used Cython before and was able to get tremendous speed ups over naive python impl', However, figuring out what exactly needs to be done seems non-trivial. Consider the following 3 implementations of a sum() function. They reside in a cython file called 'cy' (obviously, there's np.sum(), but that's besides my point..) Naive python: <pre class="prettyprint"><code>def sum_naive(A): s = 0 for a in A: s += a return s </code></pre> Cython with a function that expects a python list: <pre class="prettyprint"><code>def sum_list(A): cdef unsigned long s = 0 for a in A: s += a return s </code></pre> Cython with a function that expects a numpy array. <pre class="prettyprint"><code>def sum_np(np.ndarray[np.int64_t, ndim=1] A): cdef unsigned long s = 0 for a in A: s += a return s </code></pre> I would expect that in terms of running time, sum_np < sum_list < sum_naive, however, the following script demonstrates to the contrary (for completeness, I added np.sum() ) <pre class="prettyprint"><code>N = 1000000 v_np = np.array(range(N)) v_list = range(N) %timeit cy.sum_naive(v_list) %timeit cy.sum_naive(v_np) %timeit cy.sum_list(v_list) %timeit cy.sum_np(v_np) %timeit v_np.sum() </code></pre> with results: <pre class="prettyprint"><code>In [18]: %timeit cyMatching.sum_naive(v_list) 100 loops, best of 3: 18.7 ms per loop In [19]: %timeit cyMatching.sum_naive(v_np) 1 loops, best of 3: 389 ms per loop In [20]: %timeit cyMatching.sum_list(v_list) 10 loops, best of 3: 82.9 ms per loop In [21]: %timeit cyMatching.sum_np(v_np) 1 loops, best of 3: 1.14 s per loop In [22]: %timeit v_np.sum() 1000 loops, best of 3: 659 us per loop </code></pre> What's going on? Why is cython+numpy slow? P.S. I do use #cython: boundscheck=False #cython: wraparound=False

There is a better way to implement this in cython that at least on my machine beats <code>np.sum</code> because it avoids type checking and other things that numpy normally has to do when dealing with an arbitrary array: <pre class="prettyprint"><code>#cython.wraparound=False #cython.boundscheck=False cimport numpy as np def sum_np(np.ndarray[np.int64_t, ndim=1] A): cdef unsigned long s = 0 for a in A: s += a return s def sum_np2(np.int64_t[::1] A): cdef: unsigned long s = 0 size_t k for k in range(A.shape[0]): s += A[k] return s </code></pre> And then the timings: <pre class="prettyprint"><code>N = 1000000 v_np = np.array(range(N)) v_list = range(N) </code></pre> <hr> <pre class="prettyprint"><code>%timeit sum(v_list) %timeit sum_naive(v_list) %timeit np.sum(v_np) %timeit sum_np(v_np) %timeit sum_np2(v_np) 10 loops, best of 3: 19.5 ms per loop 10 loops, best of 3: 64.9 ms per loop 1000 loops, best of 3: 1.62 ms per loop 1 loops, best of 3: 1.7 s per loop 1000 loops, best of 3: 1.42 ms per loop </code></pre> You don't want to iterate over the numpy array via the Python style, but rather access elements using indexing as that it can be translated into pure C, rather than relying on the Python API.

<strike><code>a</code> is untyped and thus there will be lots of conversions from Python to C types and back. These can be slow.</strike> JoshAdel correctly pointed out that instead of iterating though, you should iterate over a range. Cython will convert the indexing to C, which is fast. <hr> Using <code>cython -a myfile.pyx</code> will highlight these sorts of things for you; you want all of your loop logic to be white for maximum speed. PS: Note that <code>np.ndarray[np.int64_t, ndim=1]</code> is outdated and has been deprecated in favour of the faster and more general <code>long[:]</code>.

Iterating over arrays in cython, is list faster than np.array?

Q: Are NP arrays faster than lists?

NumPy Arrays Are Faster Than Lists As predicted, we can see that NumPy arrays are significantly faster than lists.

Q: Is NumPy written in Cython?

NumPy is mostly written in C. The main advantage of Python is that there are a number of ways of very easily extending your code with C (ctypes, swig,f2py) / C++ (boost. python, weave.

Tags:

python

arrays

optimization

numpy

cython

TLDR: in cython, why (or when?) is iterating over a numpy array faster than iterating over a python list?

Generally: I've used Cython before and was able to get tremendous speed ups over naive python impl', However, figuring out what exactly needs to be done seems non-trivial.

Consider the following 3 implementations of a sum() function. They reside in a cython file called 'cy' (obviously, there's np.sum(), but that's besides my point..)

Naive python:

def sum_naive(A):
   s = 0
   for a in A:
       s += a
   return s

Cython with a function that expects a python list:

def sum_list(A):
    cdef unsigned long s = 0
    for a in A:
        s += a
    return s

Cython with a function that expects a numpy array.

def sum_np(np.ndarray[np.int64_t, ndim=1] A):
    cdef unsigned long s = 0
    for a in A:
        s += a
    return s

I would expect that in terms of running time, sum_np < sum_list < sum_naive, however, the following script demonstrates to the contrary (for completeness, I added np.sum() )

N = 1000000
v_np = np.array(range(N))
v_list = range(N)

%timeit cy.sum_naive(v_list)
%timeit cy.sum_naive(v_np)
%timeit cy.sum_list(v_list)
%timeit cy.sum_np(v_np)
%timeit v_np.sum()

with results:

In [18]: %timeit cyMatching.sum_naive(v_list)
100 loops, best of 3: 18.7 ms per loop

In [19]: %timeit cyMatching.sum_naive(v_np)
1 loops, best of 3: 389 ms per loop

In [20]: %timeit cyMatching.sum_list(v_list)
10 loops, best of 3: 82.9 ms per loop

In [21]: %timeit cyMatching.sum_np(v_np)
1 loops, best of 3: 1.14 s per loop

In [22]: %timeit v_np.sum()
1000 loops, best of 3: 659 us per loop

What's going on? Why is cython+numpy slow?

P.S.
I do use
#cython: boundscheck=False
#cython: wraparound=False

880

asked Sep 21 '13 21:09

Tomer Levinboim

2 Answers

There is a better way to implement this in cython that at least on my machine beats np.sum because it avoids type checking and other things that numpy normally has to do when dealing with an arbitrary array:

#cython.wraparound=False
#cython.boundscheck=False
cimport numpy as np

def sum_np(np.ndarray[np.int64_t, ndim=1] A):
    cdef unsigned long s = 0
    for a in A:
        s += a
    return s

def sum_np2(np.int64_t[::1] A):
    cdef:
        unsigned long s = 0
        size_t k

    for k in range(A.shape[0]):
        s += A[k]

    return s

And then the timings:

N = 1000000
v_np = np.array(range(N))
v_list = range(N)

%timeit sum(v_list)
%timeit sum_naive(v_list)
%timeit np.sum(v_np)
%timeit sum_np(v_np)
%timeit sum_np2(v_np)
10 loops, best of 3: 19.5 ms per loop
10 loops, best of 3: 64.9 ms per loop
1000 loops, best of 3: 1.62 ms per loop
1 loops, best of 3: 1.7 s per loop
1000 loops, best of 3: 1.42 ms per loop

You don't want to iterate over the numpy array via the Python style, but rather access elements using indexing as that it can be translated into pure C, rather than relying on the Python API.

166

answered Oct 17 '22 20:10

JoshAdel

~~a is untyped and thus there will be lots of conversions from Python to C types and back. These can be slow.~~

JoshAdel correctly pointed out that instead of iterating though, you should iterate over a range. Cython will convert the indexing to C, which is fast.

Using cython -a myfile.pyx will highlight these sorts of things for you; you want all of your loop logic to be white for maximum speed.

PS: Note that np.ndarray[np.int64_t, ndim=1] is outdated and has been deprecated in favour of the faster and more general long[:].

answered Oct 17 '22 20:10

Veedrac

Related questions
                            
                                Median combining fits images in python
                            
                                Integrate django via pydev in eclipse?
                            
                                Why does [].append() not work in python?
                            
                                Updating row in SqlAlchemy ORM
                            
                                Can we load pandas DataFrame in .NET ironpython?
                            
                                Python try/except not working
                            
                                Get a pointer to a list element
                            
                                an os.path.join() that doesn't discard before leading slash?
                            
                                Python timeout context manager with threads
                            
                                Drop root privileges for certain operations in Python
                            
                                Python requests module: urlencoding json data
                            
                                How can I set dataless areas of a hist2d plot to the zero-value color?
                            
                                In Pandas, How to use fillna to fill the whole columns with string if the column is empty originally?
                            
                                How to Delete Rows CSV in python
                            
                                Python: argparse taking a list of variable size
                            
                                using map(int, raw_input().split())
                            
                                How to organize values in a numpy array into bins that contain a certain range of values?
                            
                                check_output error in python
                            
                                How do I get rid of a circular dependency error while creating a database in sqlalchemy?
                            
                                Opencv Python display raw image

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With