Fastest way to calculate the sum of specific regions of an array

Question

Given the following data (in python 2.7):

import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,14])
b = np.array([8,2,3])

I want to get the sum of the first 8 elements in a, then the sum of the 9 and 10 element and in the end the last 3 (basic the information in b). The desired output is:

[36, 19, 37]

I can do this with for loops and such, but there must be a more pythonic way and a more efficient way of doing!

jdehesa · Accepted Answer

That's easy with np.split:

result = [part.sum() for part in np.split(a, np.cumsum(b))[:-1]]
print(result)
>>> [36, 19, 37]

Daniel F · Answer

A much faster way than np.split is:

np.add.reduceat(a, np.r_[0, np.cumsum(b)[:-1]])

What this does:

Creates an array of ascending indices out of b corresponding to the ranges you want to sum over - for simplicity, you can assign c = np.r_[0, np.cumsum(b)[:-1]] which for your example would be array([0, 8, 10]) - which is 0 followed all but the last element of the cumulative sum of b (np.cumsum(b) -> array([8, 10, 13]) (the domain of np.ufunc.reduceat is exclusive of the endpoint, so we have to get rid of that 13)
np.ufunc.reduceat(a, c) reduces a by ufunc (in this case, add) over ranges specified by c[i]:c[i+1]. When i+1 would overflow c, it instead reduces over c[i]:-1
reduce just condenses an array to a single value. For example, np.add.reduce(a) is equivalent to (but slower than) np.sum(a) (which is in turn slower than a.sum()). However, since reduceat pushes the for loop in the answer by @jdehsa out of python and into numpy core compiled c-code, it is much faster.

Speed test:

b = np.random.randint(1,10,(10000,))
a = np.random.randint(1,10,(np.sum(b),))

%timeit np.add.reduceat(a, np.r_[0, np.cumsum(b)[:-1]])
1000 loops, best of 3: 293 µs per loop
%timeit [part.sum() for part in np.split(a, np.cumsum(b))[:-1]]
10 loops, best of 3: 44.6 ms per loop

And with the added benefit of not wasting memory creating a temporary split copy of a

MSeifert · Answer

You can use the reduceat method of the np.add ufunc. You just need to add a zero in front of your indices and discard the last index (if it covers the complete array):

>>> import numpy as np
>>> a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,14])
>>> b = np.array([8,2,3])
>>> np.add.reduceat(a, np.append([0], np.cumsum(b)[:-1]))
array([36, 19, 37], dtype=int32)

The [:-1] discards the last index and the np.append([0], adds a zero in front of the indices.

Note that this is a slightly adapted variant of DanielFs answer.

If you don't like the append you could also create a new array yourself containing the indices:

>>> b_sum = np.zeros_like(b)
>>> np.cumsum(b[:-1], out=b_sum[1:])  # insert the cumsum in the b_sum array directly
>>> np.add.reduceat(a, b_sum)
array([36, 19, 37], dtype=int32)

Fastest way to calculate the sum of specific regions of an array

Tags:

performance

python

numpy

python-2.7

sum

Diogo Santos

3 Answers

jdehesa

Daniel F

MSeifert

Recent Activity

Donate For Us

Fastest way to calculate the sum of specific regions of an array

Tags:

performance

python

numpy

python-2.7

sum

Diogo Santos

3 Answers

jdehesa

Daniel F

MSeifert

Related questions

Recent Activity

Donate For Us