Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cython numpy accumulate function

I need to implement a function for summing the elements of an array with a variable section length. So,

a = np.arange(10)
section_lengths = np.array([3, 2, 4])
out = accumulate(a, section_lengths)
print out
array([  3.,   7.,  35.])

I attempted an implementation in cython here:

https://gist.github.com/2784725

for performance I am comparing to the pure numpy solution for the case where the section_lengths are all the same:

LEN = 10000
b = np.ones(LEN, dtype=np.int) * 2000
a = np.arange(np.sum(b), dtype=np.double)
out = np.zeros(LEN, dtype=np.double)

%timeit np.sum(a.reshape(-1,2000), axis=1)
10 loops, best of 3: 25.1 ms per loop

%timeit accumulate.accumulate(a, b, out)
10 loops, best of 3: 64.6 ms per loop

would you have any suggestion for improving performance?

like image 337
Andrea Zonca Avatar asked May 24 '12 23:05

Andrea Zonca


Video Answer


2 Answers

You might try some of the following:

  • In addition to the @cython.boundscheck(False) compiler directive, also try adding @cython.wraparound(False)

  • In your setup.py script, try adding in some optimization flags:

    ext_modules = [Extension("accumulate", ["accumulate.pyx"], extra_compile_args=["-O3",])]

  • Take a look at the .html file generated by cython -a accumulate.pyx to see if there are sections that are missing static typing or relying heavily on Python C-API calls:

    http://docs.cython.org/src/quickstart/cythonize.html#determining-where-to-add-types

  • Add a return statement at the end of the method. Currently it is doing a bunch of unnecessary error checking in your tight loop at i_el += 1.

  • Not sure if it will make a difference but I tend to make loop counters cdef unsigned int rather than just int

You also might compare your code to numpy when section_lengths are unequal, since it will probably require a bit more than just a simple sum.

like image 73
JoshAdel Avatar answered Sep 29 '22 10:09

JoshAdel


In the nest for loop update out[i_bas] is slow, you can create a temporary variable to do the accumerate, and update out[i_bas] when nest for loop finished. The following code will be as fast as numpy version:

import numpy as np
cimport numpy as np

ctypedef np.int_t DTYPE_int_t
ctypedef np.double_t DTYPE_double_t

cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def accumulate(
       np.ndarray[DTYPE_double_t, ndim=1] a not None,
       np.ndarray[DTYPE_int_t, ndim=1] section_lengths not None,
       np.ndarray[DTYPE_double_t, ndim=1] out not None,
       ):
    cdef int i_el, i_bas, sec_length, lenout
    cdef double tmp
    lenout = out.shape[0]
    i_el = 0
    for i_bas in range(lenout):
        tmp = 0
        for sec_length in range(section_lengths[i_bas]):
            tmp += a[i_el]
            i_el+=1
        out[i_bas] = tmp
like image 40
HYRY Avatar answered Sep 29 '22 10:09

HYRY