Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perform sum over different slice of each row for 2D array

Tags:

python

numpy

I have a 2D array of numbers and would like to average over different indices in each row. Say I have

import numpy as np

data = np.arange(16).reshape(4, 4)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

I have two lists, specifying the first (inclusive) and last (exclusive) index for each row:

start = [1, 0, 1, 2]
end = [2, 1, 3, 4]

Then I would like to achieve this:

result = []
for i in range(4):
    result.append(np.sum(data[i, start[i]:end[i]]))

which gives

[1, 4, 19, 29]

However, the arrays I use are a lot larger than in this example, so this method is too slow for me. Is there some smart way to avoid this loop?

My first idea was to flatten the array. Then, I guess, one would need to somehow make a list of slices and apply it in parallel on the array, which I don't know how to do.

Otherwise, I was thinking of using np.apply_along_axis but I think this only works for functions?

like image 476
cschoenle Avatar asked Sep 30 '20 19:09

cschoenle


3 Answers

Let's run with your raveling idea. You can convert the indices of your array into raveled indices like this:

ind = np.stack((start, end), axis=0)
ind += np.arange(data.shape[0]) * data.shape[1]
ind = ind.ravel(order='F')
if ind[-1] == data.size:
    ind = ind[:-1]

Now you can ravel the original array, and add.reduceat on the segments thus defined:

np.add.reduceat(data.ravel(), ind)[::2] / np.subtract(end, start)

TL;DR

def row_mean(data, start, end):
    ind = np.stack((start, end), axis=0)
    ind += np.arange(data.shape[0]) * data.shape[1]
    ind = ind.ravel(order='F')
    if ind[-1] == data.size:
        ind = ind[:-1]
    return np.add.reduceat(data.ravel(), ind)[::2] / np.subtract(end, start)

Timings

Using the exact same arrays shown in @Divakar's answer, we get the following results (specific to my machine of course):

%timeit einsum_mean(data, start, end)
261 ms ± 2.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit broadcasting_mean(data, start, end)
405 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit ragged_mean(data, start, end)
520 ms ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit row_mean(data, start, end)
45.6 ms ± 708 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Somewhat surprisingly, this method runs 5-10x faster than all the others, despite doing lots of extra work by adding up all the numbers between the regions of interest. The reason is probably that it has extremely low overhead: the arrays of indices are small, and it only makes a single pass over a 1D array.

like image 117
Mad Physicist Avatar answered Oct 05 '22 23:10

Mad Physicist


Let's try to create a mask with broadcasting:

start = np.array([1,0,1,2])
end = np.array([2,1,3,4])

idx = np.arange(data.shape[1]) 

mask = (start[:,None] <= idx) & (idx <end[:,None])

# this is for sum   
(data * mask).sum(1)

Output:

array([ 1,  4, 19, 29])

And if you need average:

(data * mask).sum(1) / mask.sum(1)

which gives:

array([ 1. ,  4. ,  9.5, 14.5])
like image 41
Quang Hoang Avatar answered Oct 05 '22 23:10

Quang Hoang


Here's one way with broadcasting to create the masking array and then using np.einsum to sum those per row using that mask -

start = np.asarray(start)
end = np.asarray(end)

r = np.arange(data.shape[1])
m = (r>=start[:,None]) & (r<end[:,None])
out = np.einsum('ij,ij->i',data,m)

To get the averages, divide by the mask summations -

avg_out = np.einsum('ij,ij->i',data,m)/np.count_nonzero(m,axis=1)

Or for the last step, use np.matmul/@ :

out = (data[:,None] @ m[:,:,None]).ravel()

Timings

# @Quang Hoang's soln with sum method
def broadcast_sum(data, start, end):    
    idx = np.arange(data.shape[1])     
    mask = (start[:,None] <= idx) & (idx <end[:,None])    
    return (data * mask).sum(1) / mask.sum(1)

# From earlier in this post
def broadcast_einsum(data, start, end):    
    r = np.arange(data.shape[1])
    m = (r>=start[:,None]) & (r<end[:,None])
    return np.einsum('ij,ij->i',data,m)/np.count_nonzero(m,axis=1)

# @Paul Panzer's soln
def ragged_mean(data,left,right):
    n,m = data.shape
    ps = np.zeros((n,m+1),data.dtype)
    left,right = map(np.asarray,(left,right))
    rng = np.arange(len(data))
    np.cumsum(data,axis=1,out=ps[:,1:])
    return (ps[rng,right]-ps[rng,left])/(right-left)

# @Mad Physicist's soln
def row_mean(data, start, end):
    ind = np.stack((start, end), axis=0)
    ind += np.arange(data.shape[0]) * data.shape[1]
    ind = ind.ravel(order='F')
    if ind[-1] == data.size:
        ind = ind[:-1]
    return np.add.reduceat(data.ravel(), ind)[::2] / np.subtract(end, start)

1. Tall array

Using given sample and tiling it along rows :

In [74]: data = np.arange(16).reshape(4,4)
    ...: start = [1,0,1,2]
    ...: end = [2,1,3,4]
    ...: 
    ...: N = 100000
    ...: data = np.repeat(data,N,axis=0)
    ...: start = np.tile(start,N)
    ...: end = np.tile(end,N)

In [75]: %timeit broadcast_sum(data, start, end)
    ...: %timeit broadcast_einsum(data, start, end)
    ...: %timeit ragged_mean(data, start, end)
    ...: %timeit row_mean(data, start, end)
41.4 ms ± 3.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
38.8 ms ± 996 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
24 ms ± 525 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
22.5 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

2. Square array

Using a large square array (same as the given sample shape) -

In [76]: np.random.seed(0)
    ...: data = np.random.rand(10000,10000)
    ...: start = np.random.randint(0,5000,10000)
    ...: end = start + np.random.randint(1,5000,10000)

In [77]: %timeit broadcast_sum(data, start, end)
    ...: %timeit broadcast_einsum(data, start, end)
    ...: %timeit ragged_mean(data, start, end)
    ...: %timeit row_mean(data, start, end)
759 ms ± 3.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
514 ms ± 5.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
932 ms ± 4.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
72.5 ms ± 587 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
like image 40
Divakar Avatar answered Oct 05 '22 23:10

Divakar