Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Cython slower than vectorized NumPy?

Consider the following Cython code :

cimport cython
cimport numpy as np
import numpy as np

@cython.boundscheck(False)
@cython.wraparound(False)
def test_memoryview(double[:] a, double[:] b):
    cdef int i
    for i in range(a.shape[0]):
        a[i] += b[i]

@cython.boundscheck(False)
@cython.wraparound(False)
def test_numpy(np.ndarray[double, ndim=1] a, np.ndarray[double, ndim=1] b):
    cdef int i
    for i in range(a.shape[0]):
        a[i] += b[i]

def test_numpyvec(a, b):
    a += b

def gendata(nb=40000000):
    a = np.random.random(nb)
    b = np.random.random(nb)
    return a, b

Running it in the interpreter yields (after a few runs to warm up the cache) :

In [14]: %timeit -n 100 test_memoryview(a, b)
100 loops, best of 3: 148 ms per loop

In [15]: %timeit -n 100 test_numpy(a, b)
100 loops, best of 3: 159 ms per loop

In [16]: %timeit -n 100 test_numpyvec(a, b)
100 loops, best of 3: 124 ms per loop

# See answer below :
In [17]: %timeit -n 100 test_raw_pointers(a, b)
100 loops, best of 3: 129 ms per loop

I tried it with different dataset sizes, and consistently had the vectorized NumPy function run faster than the compiled Cython code, while I was expecting Cython to be on par with vectorized NumPy in terms of performance.

Did I forget an optimization in my Cython code? Does NumPy use something (BLAS?) in order to make such simple operations run faster? Can I improve the performance of this code?

Update: The raw pointer version seems to be on par with NumPy. So apparently there's some overhead in using memory view or NumPy indexing.

like image 385
F.X. Avatar asked Jun 19 '14 13:06

F.X.


People also ask

Is Cython faster than NumPy?

Notice that here we're using the Python NumPy, imported using the import numpy statement. By running the above code, Cython took just 0.001 seconds to complete. For Python, the code took 0.003 seconds. Cython is nearly 3x faster than Python in this case.

Why is NumPy vectorization faster?

A major reason why vectorization is faster than its for loop counterpart is due to the underlying implementation of Numpy operations. As many of you know (if you're familiar with Python), Python is a dynamically typed language.

Is NP vectorize faster than loop?

Vectorized implementations (numpy) are much faster and more efficient as compared to for-loops. To really see HOW large the difference is, let's try some simple operations used in most machine learnign algorithms (especially deep learning).

Why is Numba faster than Cython?

Numba compiled algorithms may make the runtime of the Python codes up to a million times faster and thus may reach the speed of C. In addition, with the increasing number of operations, the computation time is usually significantly faster than Cython, the other compiler used for faster processing.


1 Answers

A change that slightly increases the speed is to specify the stride:

def test_memoryview_inorder(double[::1] a, double[::1] b):
    cdef int i
    for i in range(a.shape[0]):
        a[i] += b[i]
like image 65
Veedrac Avatar answered Oct 19 '22 02:10

Veedrac