I need to perform a lot of work using 2D numpy arrays of various sizes and I would like to offload these calculations onto cython. The idea is that my 2D numpy arrays would be passed from python to cython where it would be converted into c-array or memory view and used in a cascade of other c-level functions to do the calculations.
After some profiling I ruled out using numpy arrays in cython due to some serious python overhead. Using memory views was MUCH faster and quite easy to use, but I suspect I can squeeze even more speedup from using c-arrays.
Here is my question though - how can I declare a 2D c-array in cython without predefining its dimensions with set values? For example, I can create a c-array from numpy this way:
narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=np.dtype("i"))
cdef int c_arr[3][4]:
for i in range(3):
for j in range(4):
c_arr[i][j] = narr[i][j]
and then pass it to a function:
cdef void somefunction(int c_Arr[3][4]):
...
But this implies I have a fixed sizde of array, which in my case will be useless. So I tried something like this:
narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=np.dtype("i"))
cdef int a = np.shape(narr)[0]
cdef int b = np.shape(narr)[1]
cdef int c_arr[a][b]: # INCORRECT - EXAMPLE ONLY
for i in range(a):
for j in range(b):
c_arr[i][j] = narr[i][j]
with the intention to pass it to a function like this:
cdef void somefunction(int a, int b, int c_Arr[a][b]):
...
But it doesn't work and the compilation fails with the error "Not allowed in a constant expression". I suspect I need t do it with malloc/free somehow? I had a look at this problem (How to declare 2D list in Cython), but it does not provide the answer to my problem.
It turns out that memory-views can be as fast as c-arrays if one makes sure that indexError checking in cython is switched-off for the memory views, which can be done by using cython compiler directive:
# cython: boundscheck=False
Thanks @Veedrac for the tip!
Any 2-dimensional array can be declared as follows: Syntax: data_type array_name[][]; (OR) data_type[][] array_name; data_type: Since Java is a statically-typed language (i.e. it expects its variables to be declared before they can be assigned values).
Like the one-dimensional arrays, two-dimensional arrays may be initialized by following their declaration with a list of initial values enclosed in braces. Ex: int a[2][3]={0,0,0,1,1,1}; initializes the elements of the first row to zero and the second row to one. The initialization is done row by row.
A 2D array is stored in the computer's memory one row following another. The address of the first byte of memory is considered as the memory location of the entire 2D array.
You just need to stop doing bounds checking:
with cython.boundscheck(False):
thesum += x_view[i,j]
that brings the speed basically up to par.
If you really want a C array from it, try:
import numpy as numpy
from numpy import int32
from numpy cimport int32_t
numpy_array = numpy.array([[]], dtype=int32)
cdef:
int32_t[:, :] cython_view = numpy_array
int32_t *c_integers_array = &cython_view[0, 0]
int32_t[4] *c_2d_array = <int32_t[4] *>c_integers_array
First you get a Numpy array. You use that to get a memory view. Then you get a pointer to its data, which you cast to pointers of the desired stride.
So after invaluable help from @Veedrac (Many thanks!) I finally came up with a script that demonstrates the use of both, memory views and c-arrays to speed up calculations in Cython. They both go down to similar speeds and so I personally think using memory-views is MUCH easier.
Here is an example cython script that "accepts" a numpy array and converts it to memory view or c-array and then performs simple array summation via c-level functions:
# cython: boundscheck=False
cimport cython
import numpy as np
cimport numpy as np
from numpy import int32
from numpy cimport int32_t
#Generate numpy array:
narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]], dtype=np.dtype("i"))
cdef int a = np.shape(narr)[0]
cdef int b = np.shape(narr)[1]
cdef int i, j
testsum = np.sum(narr)
print "Test summation: np.sum(narr) =", testsum
#Generate the memory view:
cdef int [:,:] x_view = narr
#Generate the 2D c-array and its pointer:
cdef:
int32_t[:, :] cython_view = narr
int32_t *c_integers_array = &cython_view[0, 0]
int32_t[4] *c_arr = <int32_t[4] *>c_integers_array
def test1():
speed_test_mview(x_view)
def test2():
speed_test_carray(&c_arr[0][0], a, b)
cdef int speed_test_mview(int[:,:] x_view):
cdef int n, i, j, thesum
# Define the view:
for n in range(10000):
thesum = 0
for i in range(a):
for j in range(b):
thesum += x_view[i, j]
cdef int speed_test_carray(int32_t *c_Arr, int a, int b):
cdef int n, i, j, thesum
for n in range(10000):
thesum = 0
for i in range(a):
for j in range(b):
thesum += c_Arr[(i*b)+j]
Then using ipython shell timing tests reveal similar speeds:
import testlib as t
Test summation: np.sum(narr) = 136
%timeit t.test1()
10000000 loops, best of 3: 46.3 ns per loop
%timeit t.test2()
10000000 loops, best of 3: 46 ns per loop
Oh and for comparison - using numpy arrays in this example took 125 ms (not shown).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With