Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cython: why is size_t faster than int?

Changing certain Cython variables from type int to type size_t can significantly reduce some functions times (~30%), but I do not understand why.

For example:

cimport numpy as cnp
import numpy as np

def sum_int(cnp.int64_t[::1] A):
    cdef unsigned long s = 0
    cdef int k
    for k in xrange(A.shape[0]):
        s += A[k]
    return s

def sum_size_t(cnp.int64_t[::1] A):
    cdef unsigned long s = 0
    cdef size_t k
    for k in xrange(A.shape[0]):
        s += A[k]
    return s

a = np.array(range(1000000))

And the timing results:

In [17]: %timeit sum_int(a)   
1000 loops, best of 3: 652 µs per loop

In [18]: %timeit sum_size_t(a)
1000 loops, best of 3: 427 µs per loop

I am new to Cython, and know Fortran better than C. Help me out. What is the important difference between these two variable types that causes such a performance difference? What is it that I don't grok about Cython?

like image 895
john_science Avatar asked Jun 20 '16 22:06

john_science


2 Answers

You'd likely have to do a line by line profiling to find out exactly, but one thing stands out to me from the produced C file: int version is checked for wraparound to negative numbers, size_t is assumed ok.

In the int loop: (t_3 is assigned from k, they're the same type)

if (__pyx_t_3 < 0) {
  __pyx_t_3 += __pyx_v_A.shape[0];
  if (unlikely(__pyx_t_3 < 0)) __pyx_t_4 = 0;
} else if (unlikely(__pyx_t_3 >= __pyx_v_A.shape[0])) __pyx_t_4 = 0;

In the size_t loop:

if (unlikely(__pyx_t_3 >= (size_t)__pyx_v_A.shape[0])) __pyx_t_4 = 0;

So no wraparound test is needed because size_t is unsigned and guaranteed not to wrap around when indexing items in memory. The rest is virtually the same.

Update: regarding your unsigned int results - what's your size of int and size_t? Any chance they're different size, causing the change? In my case the C code for uint and size_t is identical. (since size_t is unsigned and specifically unsigned int on this system)

like image 172
viraptor Avatar answered Nov 20 '22 15:11

viraptor


On a 64 bit system there seem to be two reasons:

  1. Use an unsigned integer for the loop:

    %%cython
    
    cimport numpy as cnp
    import numpy as np
    
    def sum_int_unsigned(cnp.int64_t[::1] A):
        cdef unsigned long s = 0
        cdef unsigned k
        for k in xrange(A.shape[0]):
            s += A[k]
        return s
    
  2. Use a long instead of an int:

    %%cython
    
    cimport numpy as cnp
    import numpy as np
    
    def sum_int_unsigned_long(cnp.int64_t[::1] A):
        cdef unsigned long s = 0
        cdef unsigned long k
        for k in xrange(A.shape[0]):
            s += A[k]
        return s
    

Timings:

%timeit sum_int(a)
1000 loops, best of 3: 1.52 ms per loop

%timeit sum_size_t(a)
1000 loops, best of 3: 671 µs per loop

Using unsigned brings us half way:

%timeit sum_int_unsigned(a) 
1000 loops, best of 3: 1.09 ms per loop

Using long accounts for the rest:

%timeit sum_int_unsigned_long(a)
1000 loops, best of 3: 648 µs per loop
like image 22
Mike Müller Avatar answered Nov 20 '22 15:11

Mike Müller