I'm trying to follow an example given on the Continuum Analytics blog benchmarking Python, Cython, Numba for a sum calculated using a for loop. Unfortunately, I'm seeing that Cython is slower than Python!
Here's my Python function definition:
def python_sum(y):
N = len(y)
x = y[0]
for i in xrange(1,N):
x += y[i]
return x
And now my Cython function:
def cython_sum(int[:] y):
cdef int N = y.shape[0]
cdef int x = y[0]
cdef int i
for i in xrange(1,N):
x += y[i]
return x
Now I've got a script that pulls the two functions and benchmarks:
import timeit
import numpy as np
import cython_sum
import python_sum
b = np.ones(10000)
timer = timeit.Timer(stmt='python_sum.python_sum(b)', setup='from __main__ import python_sum, b')
print "Python Sum (ms): %g" % (timer.timeit(1)*1000)
timer = timeit.Timer(stmt='cython_sum.cython_sum(b)', setup='from __main__ import cython_sum, b')
print "Cython (ms): %g" % (timer.timeit(1)*1000)
And now my output is:
Python Sum (ms): 9.44624
Cython (ms): 8.54868
Based on the graphs in the blog post linked above, I was expecting a 100x - 1000x increase in speed, and yet all I'm seeing is that Cython is marginally faster than vanilla Python.
Am I doing something wrong here? This seems like a pretty basic question with a simple function definition, and clearly lots of people use Cython with great success, so clearly the error must lie with me. Can anyone shed some light on this and tell me what I'm doing wrong? Thank you!
Notice that here we're using the Python NumPy, imported using the import numpy statement. By running the above code, Cython took just 0.001 seconds to complete. For Python, the code took 0.003 seconds. Cython is nearly 3x faster than Python in this case.
Cython allows native C functions, which have less overhead than Python functions when they are called, and therefore execute faster.
You can use NumPy from Cython exactly the same as in regular Python, but by doing so you are losing potentially high speedups because Cython has support for fast access to NumPy arrays.
The key to making it fast is to use vectorized operations, generally implemented through NumPy's universal functions (ufuncs). This section motivates the need for NumPy's ufuncs, which can be used to make repeated calculations on array elements much more efficient.
I am not sure why you get that result. As a commenter said, your code, as-is, shouldn't even work, since you'd be passing float
s into a function expecting int
s. Maybe you left a cython_sum.py
file lying around in the same directory?
I did the following. I created a python_sum.py that contained your exact definition of python_sum
. Then I slightly changed your Cython code:
cython_sum.pyx:
def cython_sum(long[:] y): #changed `int` to `long`
cdef int N = y.shape[0]
cdef int x = y[0]
cdef int i
for i in xrange(1,N):
x += y[i]
return x
I made a setup file to be able to build the Cython module:
setup.py:
from distutils.core import setup
from Cython.Build import cythonize
setup(
name = 'Cython sum test',
ext_modules = cythonize("cython_sum.pyx"),
)
I built the module using python setup.py build_ext --inplace
. Next, I ran your test code with some modifications:
test.py:
import timeit
import numpy as np
import cython_sum
import python_sum
# ** added dtype=np.int to create integers **
b = np.ones(10000, dtype=np.int)
# ** changed .timeit(1) to .timeit(1000) for each one **
timer = timeit.Timer(stmt='python_sum.python_sum(b)', setup='from __main__ import python_sum, b')
print "Python Sum (ms): %g" % (timer.timeit(1000)*1000)
timer = timeit.Timer(stmt='cython_sum.cython_sum(b)', setup='from __main__ import cython_sum, b')
print "Cython (ms): %g" % (timer.timeit(1000)*1000)
And I got the following result:
$ python test.py
Python Sum (ms): 4111.74
Cython (ms): 7.06697
Now that is a nice speed-up!
Additionally, by following the guidelines outlined here, I was able to get an additional (small) speed-up:
cython_fast_sum.pyx:
import numpy as np
cimport numpy as np
DTYPE = np.int
ctypedef np.int_t DTYPE_t
def cython_sum(np.ndarray[DTYPE_t, ndim=1] y):
cdef int N = y.shape[0]
cdef int x = y[0]
cdef int i
for i in xrange(1,N):
x += y[i]
return x
setup_fast.py:
from distutils.core import setup
from Cython.Build import cythonize
import numpy as np
setup(
name = 'Cython fast sum test',
ext_modules = cythonize("cython_fast_sum.pyx"),
include_dirs = [np.get_include()],
)
test.py:
import timeit
import numpy as np
import cython_sum
import cython_fast_sum
b = np.ones(10000, dtype=np.int)
# ** note 100000 runs, not 1000 **
timer = timeit.Timer(stmt='cython_sum.cython_sum(b)', setup='from __main__ import cython_sum, b')
print "Cython naive (ms): %g" % (timer.timeit(100000)*1000)
timer = timeit.Timer(stmt='cython_fast_sum.cython_sum(b)', setup='from __main__ import cython_fast_sum, b')
print "Cython fast (ms): %g" % (timer.timeit(100000)*1000)
Result:
$ python test.py
Cython naive (ms): 676.437
Cython fast (ms): 645.797
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With