I am doing some performance test on a variant of the prime numbers generator from http://docs.cython.org/src/tutorial/numpy.html. The below performance measures are with kmax=1000
Pure Python implementation, running in CPython: 0.15s
Pure Python implementation, running in Cython: 0.07s
def primes(kmax):
p = []
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p.append(n)
k = k + 1
n = n + 1
return p
Pure Python+Numpy implementation, running in CPython: 1.25s
import numpy
def primes(kmax):
p = numpy.empty(kmax, dtype=int)
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
n = n + 1
return p
Cython implementation using int*: 0.003s
from libc.stdlib cimport malloc, free
def primes(int kmax):
cdef int n, k, i
cdef int *p = <int *>malloc(kmax * sizeof(int))
result = []
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
result.append(n)
n = n + 1
free(p)
return result
The above performs great but looks horrible, as it holds two copies of the data... so I tried reimplementing it:
Cython + Numpy: 1.01s
import numpy as np
cimport numpy as np
cimport cython
DTYPE = np.int
ctypedef np.int_t DTYPE_t
@cython.boundscheck(False)
def primes(DTYPE_t kmax):
cdef DTYPE_t n, k, i
cdef np.ndarray p = np.empty(kmax, dtype=DTYPE)
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
n = n + 1
return p
Questions:
how do I cast a numpy array to a int*? The below doesn't work
cdef numpy.nparray a = numpy.zeros(100, dtype=int)
cdef int * p = <int *>a.data
Notice that here we're using the Python NumPy, imported using the import numpy statement. By running the above code, Cython took just 0.001 seconds to complete. For Python, the code took 0.003 seconds. Cython is nearly 3x faster than Python in this case.
NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.
NumPy is written in C and so has a faster computational speed. SciPy is written in Python and so has a slower execution speed but vast functionality.
For small arrays (up to 1000 elements) Julia is actually faster than Python/NumPy. For intermediate size arrays (100,000 elements), Julia is nearly 2.5 times slower (and in fact, without the sum , Julia is up to 4 times slower). Finally, at the largest array sizes, Julia catches up again.
cdef DTYPE_t [:] p_view = p
Using this instead of p in the calculations. reduced the runtime from 580 ms down to 2.8 ms for me. About the exact same runtime as the implementation using *int. And that's about the max you can expect from this.
DTYPE = np.int
ctypedef np.int_t DTYPE_t
@cython.boundscheck(False)
def primes(DTYPE_t kmax):
cdef DTYPE_t n, k, i
cdef np.ndarray p = np.empty(kmax, dtype=DTYPE)
cdef DTYPE_t [:] p_view = p
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p_view[i] != 0:
i = i + 1
if i == k:
p_view[k] = n
k = k + 1
n = n + 1
return p
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With