I converted to cython a python function by just adding some types and compiling it. I was getting small numerical differences between the results of the python and cython functions. After some work I found that the differences came from accessing a numpy array using unsigned int instead of int.
I was using unsigned int indices to speed up access according to: http://docs.cython.org/src/userguide/numpy_tutorial.html#tuning-indexing-further
anyway I thought it was harmless to use unsigned ints.
See this code:
cpdef function(np.ndarray[np.float32_t, ndim=2] response, max_loc):
cdef unsigned int x, y
x, y = int(max_loc[0]), int(max_loc[1])
x2, y2 = int(max_loc[0]), int(max_loc[1])
print response[y,x], type(response[y,x]), response.dtype
print response[y2,x2], type(response[y2,x2]), response.dtype
print 2*(response[y,x] - min(response[y,x-1], response[y,x+1]))
print 2*(response[y2,x2] - min(response[y2,x2-1], response[y2,x2+1]))
prints:
0.959878861904 <type 'float'> float32
0.959879 <type 'numpy.float32'> float32
1.04306024313
1.04306030273
Why does this happen?!!! is it a bug?
Ok, as requested here is a SSCCE with the same types and values that I used in my original function
cpdef function():
cdef unsigned int x, y
max_loc2 = np.asarray([ 15., 25.], dtype=float)
cdef np.ndarray[np.float32_t, ndim=2] response2 = np.zeros((49,49), dtype=np.float32)
x, y = int(max_loc2[0]), int(max_loc2[1])
x2, y2 = int(max_loc2[0]), int(max_loc2[1])
response2[y,x] = 0.959878861904
response2[y,x-1] = 0.438348740339
response2[y,x+1] = 0.753262758255
print response2[y,x], type(response2[y,x]), response2.dtype
print response2[y2,x2], type(response2[y2,x2]), response2.dtype
print 2*(response2[y,x] - min(response2[y,x-1], response2[y,x+1]))
print 2*(response2[y2,x2] - min(response2[y2,x2-1], response2[y2,x2+1]))
prints
0.959878861904 <type 'float'> float32
0.959879 <type 'numpy.float32'> float32
1.04306024313
1.04306030273
I use python 2.7.3 cython 0.18 and msvc9 express
You can use NumPy from Cython exactly the same as in regular Python, but by doing so you are losing potentially high speedups because Cython has support for fast access to NumPy arrays.
The Takeaway. So numba is 1000 times faster than a pure python implementation, and only marginally slower than nearly identical cython code. There are some caveats here: first of all, I have years of experience with cython, and only an hour's experience with numba.
pandas provides a bunch of C or Cython optimized functions that can be faster than the NumPy equivalent function (e.g. reading text from text files).
I modified the example in the question to make it simpler to read the generated C source for the module. I'm only interested in seeing the logic that creates Python float
objects instead of getting np.float32
objects from the response
array.
I'm using pyximport
to compile the extension module. It saves the generated C file in a subdirectory of ~/.pyxbld
(probably %userprofile%\.pyxbld
on Windows).
import numpy as np
import pyximport
pyximport.install(setup_args={'include_dirs': [np.get_include()]})
open('_tmp.pyx', 'w').write('''
cimport numpy as np
cpdef function(np.ndarray[np.float32_t, ndim=2] response, max_loc):
cdef unsigned int p_one, q_one
p_one = int(max_loc[0])
q_one = int(max_loc[1])
p_two = int(max_loc[0])
q_two = int(max_loc[1])
r_one = response[q_one, p_one]
r_two = response[q_two, p_two]
''')
import _tmp
assert(hasattr(_tmp, 'function'))
Here's the generated C code for the section of interest (a bit reformatted to make it easier to read). It turns out that when you use C unsigned int
index variables, the generated code grabs the data directly from the array buffer and calls PyFloat_FromDouble
, which coerces it to double
. On the other hand, when you use Python int
index variables, it takes the generic approach. It forms a tuple and calls PyObject_GetItem
. This way allows the ndarray
to correctly honor the np.float32
dtype.
#define __Pyx_BufPtrStrided2d(type, buf, i0, s0, i1, s1) \
(type)((char*)buf + i0 * s0 + i1 * s1)
/* "_tmp.pyx":9
* p_two = int(max_loc[0])
* q_two = int(max_loc[1])
* r_one = response[q_one, p_one] # <<<<<<<<<<<<<<
* r_two = response[q_two, p_two]
*/
__pyx_t_3 = __pyx_v_q_one;
__pyx_t_4 = __pyx_v_p_one;
__pyx_t_5 = -1;
if (unlikely(__pyx_t_3 >= (size_t)__pyx_bshape_0_response))
__pyx_t_5 = 0;
if (unlikely(__pyx_t_4 >= (size_t)__pyx_bshape_1_response))
__pyx_t_5 = 1;
if (unlikely(__pyx_t_5 != -1)) {
__Pyx_RaiseBufferIndexError(__pyx_t_5);
{
__pyx_filename = __pyx_f[0];
__pyx_lineno = 9;
__pyx_clineno = __LINE__;
goto __pyx_L1_error;
}
}
__pyx_t_1 = PyFloat_FromDouble((
*__Pyx_BufPtrStrided2d(
__pyx_t_5numpy_float32_t *,
__pyx_bstruct_response.buf,
__pyx_t_3, __pyx_bstride_0_response,
__pyx_t_4, __pyx_bstride_1_response)));
if (unlikely(!__pyx_t_1)) {
__pyx_filename = __pyx_f[0];
__pyx_lineno = 9;
__pyx_clineno = __LINE__;
goto __pyx_L1_error;
}
__Pyx_GOTREF(__pyx_t_1);
__pyx_v_r_one = __pyx_t_1;
__pyx_t_1 = 0;
/* "_tmp.pyx":10
* q_two = int(max_loc[1])
* r_one = response[q_one, p_one]
* r_two = response[q_two, p_two] # <<<<<<<<<<<<<<
*/
__pyx_t_1 = PyTuple_New(2);
if (unlikely(!__pyx_t_1)) {
__pyx_filename = __pyx_f[0];
__pyx_lineno = 10;
__pyx_clineno = __LINE__;
goto __pyx_L1_error;
}
__Pyx_GOTREF(((PyObject *)__pyx_t_1));
__Pyx_INCREF(__pyx_v_q_two);
PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_v_q_two);
__Pyx_GIVEREF(__pyx_v_q_two);
__Pyx_INCREF(__pyx_v_p_two);
PyTuple_SET_ITEM(__pyx_t_1, 1, __pyx_v_p_two);
__Pyx_GIVEREF(__pyx_v_p_two);
__pyx_t_2 = PyObject_GetItem(
((PyObject *)__pyx_v_response),
((PyObject *)__pyx_t_1));
if (!__pyx_t_2) {
__pyx_filename = __pyx_f[0];
__pyx_lineno = 10;
__pyx_clineno = __LINE__;
goto __pyx_L1_error;
}
__Pyx_GOTREF(__pyx_t_2);
__Pyx_DECREF(((PyObject *)__pyx_t_1));
__pyx_t_1 = 0;
__pyx_v_r_two = __pyx_t_2;
__pyx_t_2 = 0;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With