I need to implement a function for summing the elements of an array with a variable section length. So,
a = np.arange(10)
section_lengths = np.array([3, 2, 4])
out = accumulate(a, section_lengths)
print out
array([ 3., 7., 35.])
I attempted an implementation in cython
here:
https://gist.github.com/2784725
for performance I am comparing to the pure numpy
solution for the case where the section_lengths are all the same:
LEN = 10000
b = np.ones(LEN, dtype=np.int) * 2000
a = np.arange(np.sum(b), dtype=np.double)
out = np.zeros(LEN, dtype=np.double)
%timeit np.sum(a.reshape(-1,2000), axis=1)
10 loops, best of 3: 25.1 ms per loop
%timeit accumulate.accumulate(a, b, out)
10 loops, best of 3: 64.6 ms per loop
would you have any suggestion for improving performance?
You might try some of the following:
In addition to the @cython.boundscheck(False)
compiler directive, also try adding @cython.wraparound(False)
In your setup.py
script, try adding in some optimization flags:
ext_modules = [Extension("accumulate", ["accumulate.pyx"], extra_compile_args=["-O3",])]
Take a look at the .html file generated by cython -a accumulate.pyx
to see if there are sections that are missing static typing or relying heavily on Python C-API calls:
http://docs.cython.org/src/quickstart/cythonize.html#determining-where-to-add-types
Add a return
statement at the end of the method. Currently it is doing a bunch of unnecessary error checking in your tight loop at i_el += 1
.
Not sure if it will make a difference but I tend to make loop counters cdef unsigned int
rather than just int
You also might compare your code to numpy when section_lengths
are unequal, since it will probably require a bit more than just a simple sum
.
In the nest for loop update out[i_bas]
is slow, you can create a temporary variable to do the accumerate, and update out[i_bas]
when nest for loop finished. The following code will be as fast as numpy version:
import numpy as np
cimport numpy as np
ctypedef np.int_t DTYPE_int_t
ctypedef np.double_t DTYPE_double_t
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def accumulate(
np.ndarray[DTYPE_double_t, ndim=1] a not None,
np.ndarray[DTYPE_int_t, ndim=1] section_lengths not None,
np.ndarray[DTYPE_double_t, ndim=1] out not None,
):
cdef int i_el, i_bas, sec_length, lenout
cdef double tmp
lenout = out.shape[0]
i_el = 0
for i_bas in range(lenout):
tmp = 0
for sec_length in range(section_lengths[i_bas]):
tmp += a[i_el]
i_el+=1
out[i_bas] = tmp
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With