Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to calculate the sum of specific regions of an array

Given the following data (in python 2.7):

import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,14])
b = np.array([8,2,3])

I want to get the sum of the first 8 elements in a, then the sum of the 9 and 10 element and in the end the last 3 (basic the information in b). The desired output is:

[36, 19, 37]

I can do this with for loops and such, but there must be a more pythonic way and a more efficient way of doing!

like image 206
Diogo Santos Avatar asked Aug 03 '17 10:08

Diogo Santos


3 Answers

That's easy with np.split:

result = [part.sum() for part in np.split(a, np.cumsum(b))[:-1]]
print(result)
>>> [36, 19, 37]
like image 190
jdehesa Avatar answered Oct 11 '22 02:10

jdehesa


A much faster way than np.split is:

np.add.reduceat(a, np.r_[0, np.cumsum(b)[:-1]]) 

What this does:

  1. Creates an array of ascending indices out of b corresponding to the ranges you want to sum over - for simplicity, you can assign c = np.r_[0, np.cumsum(b)[:-1]] which for your example would be array([0, 8, 10]) - which is 0 followed all but the last element of the cumulative sum of b (np.cumsum(b) -> array([8, 10, 13]) (the domain of np.ufunc.reduceat is exclusive of the endpoint, so we have to get rid of that 13)
  2. np.ufunc.reduceat(a, c) reduces a by ufunc (in this case, add) over ranges specified by c[i]:c[i+1]. When i+1 would overflow c, it instead reduces over c[i]:-1
  3. reduce just condenses an array to a single value. For example, np.add.reduce(a) is equivalent to (but slower than) np.sum(a) (which is in turn slower than a.sum()). However, since reduceat pushes the for loop in the answer by @jdehsa out of python and into numpy core compiled c-code, it is much faster.

Speed test:

b = np.random.randint(1,10,(10000,))
a = np.random.randint(1,10,(np.sum(b),))

%timeit np.add.reduceat(a, np.r_[0, np.cumsum(b)[:-1]])
1000 loops, best of 3: 293 µs per loop
%timeit [part.sum() for part in np.split(a, np.cumsum(b))[:-1]]
10 loops, best of 3: 44.6 ms per loop

And with the added benefit of not wasting memory creating a temporary split copy of a

like image 8
Daniel F Avatar answered Oct 11 '22 02:10

Daniel F


You can use the reduceat method of the np.add ufunc. You just need to add a zero in front of your indices and discard the last index (if it covers the complete array):

>>> import numpy as np
>>> a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,14])
>>> b = np.array([8,2,3])
>>> np.add.reduceat(a, np.append([0], np.cumsum(b)[:-1]))
array([36, 19, 37], dtype=int32)

The [:-1] discards the last index and the np.append([0], adds a zero in front of the indices.

Note that this is a slightly adapted variant of DanielFs answer.

If you don't like the append you could also create a new array yourself containing the indices:

>>> b_sum = np.zeros_like(b)
>>> np.cumsum(b[:-1], out=b_sum[1:])  # insert the cumsum in the b_sum array directly
>>> np.add.reduceat(a, b_sum)
array([36, 19, 37], dtype=int32)
like image 4
MSeifert Avatar answered Oct 11 '22 00:10

MSeifert