Given the following data (in python 2.7):
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,14])
b = np.array([8,2,3])
I want to get the sum of the first 8 elements in a
, then the sum of the 9 and 10 element and in the end the last 3 (basic the information in b
). The desired output is:
[36, 19, 37]
I can do this with for loops and such, but there must be a more pythonic way and a more efficient way of doing!
That's easy with np.split
:
result = [part.sum() for part in np.split(a, np.cumsum(b))[:-1]]
print(result)
>>> [36, 19, 37]
A much faster way than np.split
is:
np.add.reduceat(a, np.r_[0, np.cumsum(b)[:-1]])
What this does:
b
corresponding to the ranges you want to sum over - for simplicity, you can assign c = np.r_[0, np.cumsum(b)[:-1]]
which for your example would be array([0, 8, 10])
- which is 0
followed all but the last element of the cumulative sum of b
(np.cumsum(b) -> array([8, 10, 13])
(the domain of np.ufunc.reduceat
is exclusive of the endpoint, so we have to get rid of that 13
)np.ufunc.reduceat(a, c)
reduce
s a
by ufunc
(in this case, add
) over ranges specified by c[i]:c[i+1]
. When i+1
would overflow c
, it instead reduce
s over c[i]:-1
reduce
just condenses an array to a single value. For example, np.add.reduce(a)
is equivalent to (but slower than) np.sum(a)
(which is in turn slower than a.sum()
). However, since reduceat
pushes the for
loop in the answer by @jdehsa out of python and into numpy
core compiled c-code, it is much faster.Speed test:
b = np.random.randint(1,10,(10000,))
a = np.random.randint(1,10,(np.sum(b),))
%timeit np.add.reduceat(a, np.r_[0, np.cumsum(b)[:-1]])
1000 loops, best of 3: 293 µs per loop
%timeit [part.sum() for part in np.split(a, np.cumsum(b))[:-1]]
10 loops, best of 3: 44.6 ms per loop
And with the added benefit of not wasting memory creating a temporary split
copy of a
You can use the reduceat
method of the np.add
ufunc. You just need to add a zero in front of your indices and discard the last index (if it covers the complete array):
>>> import numpy as np
>>> a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,14])
>>> b = np.array([8,2,3])
>>> np.add.reduceat(a, np.append([0], np.cumsum(b)[:-1]))
array([36, 19, 37], dtype=int32)
The [:-1]
discards the last index and the np.append([0],
adds a zero in front of the indices.
Note that this is a slightly adapted variant of DanielFs answer.
If you don't like the append
you could also create a new array yourself containing the indices:
>>> b_sum = np.zeros_like(b)
>>> np.cumsum(b[:-1], out=b_sum[1:]) # insert the cumsum in the b_sum array directly
>>> np.add.reduceat(a, b_sum)
array([36, 19, 37], dtype=int32)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With