I'm rather new to Python and absolutely ignorant of C (unfortunately) so I am struggling to properly understand some aspects of working with Cython.
After profiling a Python program and discovering that it was just a couple of loops that were hogging most of the time, I decided to look into dumping them into Cython. Initially, I just let Cython interpret the Python as it was, and the result was a (remarkable!) ~2x speed boost. Cool!
From the Python main, I pass the function two 2-D arrays ("a" and "b") and a float, "d", and it returns a list, "newlist". As examples:
a =[[12.7, 13.5, 1.0],[23.4, 43.1, 1.0],...]
b =[[0.46,0.95,0],[4.56,0.92,0],...]
d = 0.1
Here is the original code, with just the cdefs added for Cython:
def loop(a, b, d):
cdef int i, j
cdef double x, y
newlist = []
for i in range(len(a)):
if b[i][2] != 1:
for j in range(i+1,len(a)):
if a[i] == a[j] and b[j][2] != 1:
x = b[i][0]+b[j][0]
y = b[i][1]+b[j][1]
b[i][2],b[j][2] = 1,1
if abs(y)/abs(x) > d:
if y > 0: newlist.append([a[i][0],a[i][1],y])
return newlist
In "pure Python", this ran (with several ten-thousand loops) in ~12.5s. In Cython it ran in ~6.3s. Great progress with near-zero work done!
However, with a little reading, it was clear that much, much more could be done, so I set about trying to apply some type changes to get things going even faster, following the Cython docs, here (also referenced in the comments).
Here are the collected modifications, meant to mimic the Cython docs:
import numpy as np
cimport numpy as np
DtypeA = np.float
DtypeB = np.int
ctypedef np.float_t DtypeA_t
ctypedef np.int_t DtypeB_t
def loop(np.ndarray[DtypeA_t, ndim=2] A,
np.ndarray[DtypeA_t, ndim=2] B,
np.ndarray[DtypeB_t, ndim=1] C,
float D):
cdef Py_ssize_t i, j
cdef float x, y
cdef np.ndarray[DtypeA_t, ndim=2] NEW_ARRAY = np.zeros((len(C),3), dtype=DtypeA)
for i in range(len(C)):
if C[i] != 1:
for j in range(i+1,len(C)):
if A[i][0]==A[j][0] and A[i][1]==A[j][1] and C[j]!= 1:
x = B[i][0]+B[j][0]
y = B[i][1]+B[j][1]
C[i],C[j] = 1,1
if abs(y)/abs(x) > D:
if y > 0: NEW_ARRAY[i]=([A[i][0],A[i][1],y])
return NEW_ARRAY
Among other things, I split the previous array "b" into two different input arrays "B" and "C", because each row of "b" contained 2 float elements and an integer that just acted as a flag. So I removed the flag integers and put them in a separate 1-D array, "C". So, the inputs now looked liked this:
A =[[12.7, 13.5, 1.0],[23.4, 43.1, 1.0],...]
B =[[0.46,0.95],[4.56,0.92],...]
C =[0,0,...]
D = 0.1
Ideally, this should go much faster with all the variables now being typed(?)...but obviously I'm doing something very wrong, because the function now comes in at a time of 35.3s...way WORSE than the "pure Python"!!
What am I botching so badly? Thanks for reading!
I believe the use of the indexing notation b[j][0]
may be throwing Cython off, making it impossible for it to use fast indexing operations behind the scenes. Incidentally, even in pure Python code this style is not idiomatic and may lead to slower code.
Try instead to use the notation b[j,0]
throughout and see if it improves your performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With