I'm trying to optimize a Python algorithm by implementing it in Cython. My question is regarding a certain performance bottleneck that exists in the following code:
@cython.boundscheck(False) # turn off bounds-checking for entire function
def anglesToRGB( np.ndarray[double, ndim=2] y, np.ndarray[double, ndim=2] x ):
cdef double angle
cdef double Hp
cdef double C
cdef double X
cdef np.ndarray[double, ndim=3] res = np.zeros([y.shape[0], y.shape[1], 3], dtype=np.float64)
for i in xrange(y.shape[0]):
for j in xrange(y.shape[1]):
angle = atan2( y[i,j], x[i,j] )*180.0/PI+180
C = sqrt(pow(y[i,j],2)+pow(x[i,j],2))/360.0 #Chroma
Hp = angle/60.0
X = C*(1-fabs( Hp%2-1))
C *= 255
X *= 255
if (0. <= Hp < 1.):
res[i,j,:] = [C,X,0]
elif (1. <= Hp < 2.):
res[i,j,:] = [X,C,0]
elif (2. <= Hp < 3.):
res[i,j,:] = [0,C,X]
elif (3. <= Hp < 4.):
res[i,j,:] = [0,X,C]
elif (4. <= Hp < 5.):
res[i,j,:] = [X,C,C]
else:
res[i,j,:] = [C,0,X]
return res
I've identified the major bottleneck to be when i assign a list of values to a slice of the res array, like with
res[i,j,:] = [C,X,0]
However, if i change the assignment to
res[i,j,0] = C
res[i,j,1] = X
res[i,j,2] = 0
Then the code runs orders of magnitude faster. To me this is strange because surely the Cython compiler should be smart enough to do this for me? Or do i need to provide it with some hints first? I should note that changing the slicing to 0:3 instead of : and making the list of values a numpy array doesn't improve the performance.
What i'd like to know is why this operation is killing performance so badly and if there's any way to solve it without having to sacrifice the convenient list and slice notation.
Best regards
Nope, Cython (tested with 0.17) isn't smart enough to optimize this slice assignment away. If you look at the generated C code (use cython -a
and click any line in the HTML report to see the generated code), then you can see that
res[i,j,:] = [C,X,0]
is compiled to
[C,X,0]
(i, j, slice(None))
res.__setitem__
I.e., almost all the same things CPython would do to execute this code.
What you can do to get around this is:
cdef double v1, v2, v3
;v1, v2, v3 = C, X, 0
etc., which is optimized to three C assignments;v1, v2, v3
to res[i,j,0]
etc. in three separate assignments.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With