I am trying to speed up some python code with cython, and I'm making use of cython's -a
option to see where I can improve things. My understanding is that in the generated html file, the highlighted lines are ones where python functions are called - is that correct?
In the following trivial function, I have declared the numpy array argument arr
using the buffer syntax. I thought that this allows indexing operations to take place purely in C without having to call python functions. However, cython -a
(version 0.15) highlights the line where I set the value of an element of arr
, though not the one where i read one of its elements. Why does this happen? Is there a more efficient way of accessing numpy array elements?
import numpy
cimport numpy
def foo(numpy.ndarray[double, ndim=1] arr not None):
cdef int i
cdef double elem
for i in xrange(10):
elem = arr[i] #not highlighted
arr[i] = 1.0 + elem #highlighted
EDIT: Also, how does the mode
buffer argument interact with numpy? Assuming I haven't changed the order
argument of numpy.array
from the default, is it always safe to use mode='c'
? Does this actually make a difference to performance?
EDIT after delnan's comment: arr[i] += 1
also gets highlighted (that is why I split it up in the first place, to see which part of the operation was causing the issue). If I turn off bounds checking to simplify things (this makes no difference to what gets highlighted), the generated c code is:
/* "ct.pyx":11
* cdef int i
* cdef double elem
* for i in xrange(10): # <<<<<<<<<<<<<<
* elem = arr[i]
* arr[i] = 1.0 + elem
*/
for (__pyx_t_1 = 0; __pyx_t_1 < 10; __pyx_t_1+=1) {
__pyx_v_i = __pyx_t_1;
/* "ct.pyx":12
* cdef double elem
* for i in xrange(10):
* elem = arr[i] # <<<<<<<<<<<<<<
* arr[i] = 1.0 + elem
*/
__pyx_t_2 = __pyx_v_i;
__pyx_v_elem = (*__Pyx_BufPtrStrided1d(double *, __pyx_bstruct_arr.buf, __pyx_t_2, __pyx_bstride_0_arr));
/* "ct.pyx":13
* for i in xrange(10):
* elem = arr[i]
* arr[i] = 1.0 + elem # <<<<<<<<<<<<<<
*/
__pyx_t_3 = __pyx_v_i;
*__Pyx_BufPtrStrided1d(double *, __pyx_bstruct_arr.buf, __pyx_t_3, __pyx_bstride_0_arr) = (1.0 + __pyx_v_elem);
}
You can use NumPy from Cython exactly the same as in regular Python, but by doing so you are losing potentially high speedups because Cython has support for fast access to NumPy arrays.
Numba compiled algorithms may make the runtime of the Python codes up to a million times faster and thus may reach the speed of C. In addition, with the increasing number of operations, the computation time is usually significantly faster than Cython, the other compiler used for faster processing.
Large data For larger input data, Numba version of function is must faster than Numpy version, even taking into account of the compiling time. In fact, the ratio of the Numpy and Numba run time will depends on both datasize, and the number of loops, or more general the nature of the function (to be compiled).
The answer is that the highlighter fools the reader. I compiled your code and the instructions generated under the highlight are those needed to handle the error cases and the return value, they are not related to the array assignment.
Indeed if you change the code to read :
def foo(numpy.ndarray[double, ndim=1] arr not None):
cdef int i
cdef double elem
for i in xrange(10):
elem = arr[i]
arr[i] = 1.0 + elem
return # + add this
The highlight would be on the last line and not more in the assignment.
You can further speed up your code by using the @cython.boundscheck:
import numpy
cimport numpy
cimport cython
@cython.boundscheck(False)
def foo(numpy.ndarray[double, ndim=1] arr not None):
cdef int i
cdef double elem
for i in xrange(10):
elem = arr[i]
arr[i] = 1.0 + elem
return
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With