Changing certain Cython variables from type int
to type size_t
can significantly reduce some functions times (~30%), but I do not understand why.
For example:
cimport numpy as cnp
import numpy as np
def sum_int(cnp.int64_t[::1] A):
cdef unsigned long s = 0
cdef int k
for k in xrange(A.shape[0]):
s += A[k]
return s
def sum_size_t(cnp.int64_t[::1] A):
cdef unsigned long s = 0
cdef size_t k
for k in xrange(A.shape[0]):
s += A[k]
return s
a = np.array(range(1000000))
And the timing results:
In [17]: %timeit sum_int(a)
1000 loops, best of 3: 652 µs per loop
In [18]: %timeit sum_size_t(a)
1000 loops, best of 3: 427 µs per loop
I am new to Cython, and know Fortran better than C. Help me out. What is the important difference between these two variable types that causes such a performance difference? What is it that I don't grok about Cython?
You'd likely have to do a line by line profiling to find out exactly, but one thing stands out to me from the produced C file: int
version is checked for wraparound to negative numbers, size_t
is assumed ok.
In the int loop: (t_3
is assigned from k
, they're the same type)
if (__pyx_t_3 < 0) {
__pyx_t_3 += __pyx_v_A.shape[0];
if (unlikely(__pyx_t_3 < 0)) __pyx_t_4 = 0;
} else if (unlikely(__pyx_t_3 >= __pyx_v_A.shape[0])) __pyx_t_4 = 0;
In the size_t loop:
if (unlikely(__pyx_t_3 >= (size_t)__pyx_v_A.shape[0])) __pyx_t_4 = 0;
So no wraparound test is needed because size_t
is unsigned and guaranteed not to wrap around when indexing items in memory. The rest is virtually the same.
Update: regarding your unsigned int
results - what's your size of int and size_t? Any chance they're different size, causing the change? In my case the C code for uint and size_t is identical. (since size_t is unsigned and specifically unsigned int on this system)
On a 64 bit system there seem to be two reasons:
Use an unsigned integer for the loop:
%%cython
cimport numpy as cnp
import numpy as np
def sum_int_unsigned(cnp.int64_t[::1] A):
cdef unsigned long s = 0
cdef unsigned k
for k in xrange(A.shape[0]):
s += A[k]
return s
Use a long
instead of an int
:
%%cython
cimport numpy as cnp
import numpy as np
def sum_int_unsigned_long(cnp.int64_t[::1] A):
cdef unsigned long s = 0
cdef unsigned long k
for k in xrange(A.shape[0]):
s += A[k]
return s
Timings:
%timeit sum_int(a)
1000 loops, best of 3: 1.52 ms per loop
%timeit sum_size_t(a)
1000 loops, best of 3: 671 µs per loop
Using unsigned
brings us half way:
%timeit sum_int_unsigned(a)
1000 loops, best of 3: 1.09 ms per loop
Using long
accounts for the rest:
%timeit sum_int_unsigned_long(a)
1000 loops, best of 3: 648 µs per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With