I am writing a rather big simulation in Python and was hoping to get some extra performance from Cython. However, for the code below I don't seem to get all that much, even though it contains a rather large loop. Roughly 100k iterations.
Did I make some beginners mistake or is this loop-size simply to small to have a big effect? (In my tests the Cython code was only about 2 times faster).
import numpy as np;
cimport numpy as np;
import math
ctypedef np.complex64_t cpl_t
cpl = np.complex64
def example(double a, np.ndarray[cpl_t,ndim=2] A):
cdef int N = 100
cdef np.ndarray[cpl_t,ndim=3] B = np.zeros((3,N,N),dtype = cpl)
cdef Py_ssize_t n, m;
for n in range(N):
for m in range(N):
if np.sqrt(A[0,n]) > 1:
B[0,n,m] = A[0,n] + 1j * A[0,m]
return B;
You should use compiler directives. I wrote your function in Python
import numpy as np
def example_python(a, A):
N = 100
B = np.zeros((3,N,N),dtype = np.complex)
aux = np.sqrt(A[0])
for n in range(N):
if aux[n] > 1:
for m in range(N):
B[0,n,m] = A[0,n] + 1j * A[0,m]
return B
and in Cython (you can learn about compiler directives here)
import cython
import numpy as np
cimport numpy as np
ctypedef np.complex64_t cpl_t
cpl = np.complex64
@cython.boundscheck(False) # compiler directive
@cython.wraparound(False) # compiler directive
def example_cython(double a, np.ndarray[cpl_t,ndim=2] A):
cdef int N = 100
cdef np.ndarray[cpl_t,ndim=3] B = np.zeros((3,N,N),dtype = cpl)
cdef np.ndarray[float, ndim=1] aux
cdef Py_ssize_t n, m
aux = np.sqrt(A[0,:]).real
for n in range(N):
if aux[n] > 1.:
for m in range(N):
B[0,n,m] = A[0,n] + 1j * A[0,m]
return B
I compare both functions
c = np.array(np.random.rand(100,100)+1.5+1j*np.random.rand(100,100), dtype=np.complex64)
%timeit example_python(100, c)
10 loops, best of 3: 61.8 ms per loop
%timeit example_cython(100, c)
10000 loops, best of 3: 134 µs per loop
Cython is ~450 times faster than Python in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With