I have a 2D array of numbers and would like to average over different indices in each row. Say I have
import numpy as np
data = np.arange(16).reshape(4, 4)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
I have two lists, specifying the first (inclusive) and last (exclusive) index for each row:
start = [1, 0, 1, 2]
end = [2, 1, 3, 4]
Then I would like to achieve this:
result = []
for i in range(4):
result.append(np.sum(data[i, start[i]:end[i]]))
which gives
[1, 4, 19, 29]
However, the arrays I use are a lot larger than in this example, so this method is too slow for me. Is there some smart way to avoid this loop?
My first idea was to flatten the array. Then, I guess, one would need to somehow make a list of slices and apply it in parallel on the array, which I don't know how to do.
Otherwise, I was thinking of using np.apply_along_axis
but I think this only works for functions?
Let's run with your raveling idea. You can convert the indices of your array into raveled indices like this:
ind = np.stack((start, end), axis=0)
ind += np.arange(data.shape[0]) * data.shape[1]
ind = ind.ravel(order='F')
if ind[-1] == data.size:
ind = ind[:-1]
Now you can ravel the original array, and add.reduceat
on the segments thus defined:
np.add.reduceat(data.ravel(), ind)[::2] / np.subtract(end, start)
TL;DR
def row_mean(data, start, end):
ind = np.stack((start, end), axis=0)
ind += np.arange(data.shape[0]) * data.shape[1]
ind = ind.ravel(order='F')
if ind[-1] == data.size:
ind = ind[:-1]
return np.add.reduceat(data.ravel(), ind)[::2] / np.subtract(end, start)
Timings
Using the exact same arrays shown in @Divakar's answer, we get the following results (specific to my machine of course):
%timeit einsum_mean(data, start, end)
261 ms ± 2.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit broadcasting_mean(data, start, end)
405 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit ragged_mean(data, start, end)
520 ms ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit row_mean(data, start, end)
45.6 ms ± 708 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Somewhat surprisingly, this method runs 5-10x faster than all the others, despite doing lots of extra work by adding up all the numbers between the regions of interest. The reason is probably that it has extremely low overhead: the arrays of indices are small, and it only makes a single pass over a 1D array.
Let's try to create a mask with broadcasting:
start = np.array([1,0,1,2])
end = np.array([2,1,3,4])
idx = np.arange(data.shape[1])
mask = (start[:,None] <= idx) & (idx <end[:,None])
# this is for sum
(data * mask).sum(1)
Output:
array([ 1, 4, 19, 29])
And if you need average:
(data * mask).sum(1) / mask.sum(1)
which gives:
array([ 1. , 4. , 9.5, 14.5])
Here's one way with broadcasting
to create the masking array and then using np.einsum
to sum those per row using that mask -
start = np.asarray(start)
end = np.asarray(end)
r = np.arange(data.shape[1])
m = (r>=start[:,None]) & (r<end[:,None])
out = np.einsum('ij,ij->i',data,m)
To get the averages, divide by the mask summations -
avg_out = np.einsum('ij,ij->i',data,m)/np.count_nonzero(m,axis=1)
Or for the last step, use np.matmul/@
:
out = (data[:,None] @ m[:,:,None]).ravel()
Timings
# @Quang Hoang's soln with sum method
def broadcast_sum(data, start, end):
idx = np.arange(data.shape[1])
mask = (start[:,None] <= idx) & (idx <end[:,None])
return (data * mask).sum(1) / mask.sum(1)
# From earlier in this post
def broadcast_einsum(data, start, end):
r = np.arange(data.shape[1])
m = (r>=start[:,None]) & (r<end[:,None])
return np.einsum('ij,ij->i',data,m)/np.count_nonzero(m,axis=1)
# @Paul Panzer's soln
def ragged_mean(data,left,right):
n,m = data.shape
ps = np.zeros((n,m+1),data.dtype)
left,right = map(np.asarray,(left,right))
rng = np.arange(len(data))
np.cumsum(data,axis=1,out=ps[:,1:])
return (ps[rng,right]-ps[rng,left])/(right-left)
# @Mad Physicist's soln
def row_mean(data, start, end):
ind = np.stack((start, end), axis=0)
ind += np.arange(data.shape[0]) * data.shape[1]
ind = ind.ravel(order='F')
if ind[-1] == data.size:
ind = ind[:-1]
return np.add.reduceat(data.ravel(), ind)[::2] / np.subtract(end, start)
1. Tall array
Using given sample and tiling it along rows :
In [74]: data = np.arange(16).reshape(4,4)
...: start = [1,0,1,2]
...: end = [2,1,3,4]
...:
...: N = 100000
...: data = np.repeat(data,N,axis=0)
...: start = np.tile(start,N)
...: end = np.tile(end,N)
In [75]: %timeit broadcast_sum(data, start, end)
...: %timeit broadcast_einsum(data, start, end)
...: %timeit ragged_mean(data, start, end)
...: %timeit row_mean(data, start, end)
41.4 ms ± 3.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
38.8 ms ± 996 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
24 ms ± 525 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
22.5 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
2. Square array
Using a large square array (same as the given sample shape) -
In [76]: np.random.seed(0)
...: data = np.random.rand(10000,10000)
...: start = np.random.randint(0,5000,10000)
...: end = start + np.random.randint(1,5000,10000)
In [77]: %timeit broadcast_sum(data, start, end)
...: %timeit broadcast_einsum(data, start, end)
...: %timeit ragged_mean(data, start, end)
...: %timeit row_mean(data, start, end)
759 ms ± 3.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
514 ms ± 5.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
932 ms ± 4.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
72.5 ms ± 587 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With