Summing values of 2D array on indices

Tags:

I need to extend this question, which sums values of an array based on indices from a second array. Let A be the result array, B be the index array, and C the array to be summed over. Then A[i] = sum over C such that index(B) == i.

Instead, my setup is

N = 5
M = 2

A = np.zeros((M,N))
B = np.random.randint(M, size=N) # contains indices for A
C = np.random.rand(N,N)

I need A[i,j] = sum_{k in 0...N} C[j,k] such that C[k] == i , i.e. a rowsum conditional on the indices of B matching i. Is there an efficient way to do this? For my application N is around 10,000 and M is around 20. This operation is called for every iteration in a minimization problem... my current looping method is terribly slow.

Thanks!

312

asked Apr 07 '13 16:04

Kevin

2 Answers

Following @DSM's comment, I'm assuming your C[k] == i supposed to be B[k] == i. If that's the case, does your loop version look something like this?

Nested Loop Version

import numpy as np

N = 5
M = 2

A = np.zeros((M,N))
B = np.random.randint(M, size=N) # contains indices for A
C = np.random.rand(N,N)

for i in range(M):
    for j in range(N):
        for k in range(N):
            if B[k] == i:
                A[i,j] += C[j,k]

There's more than one way to vectorize this problem. I'm going to show my thought process below, but there are more efficient ways to do it (e.g. @DSM's version that recognizes the matrix multiplication inherent in the problem).

For the sake of explanation, here's a walk-through of one approach.

Vectorizing the Inner Loop

Let's start by re-writing the inner k loop:

for i in range(M):
    for j in range(N):
        A[i,j] = C[j, B == i].sum()

It might be easier to think of this as C[j][B == i].sum(). We're just selecting j^th row of C, selecting only the elements in that row where B is equal to i, and summing them.

Vectorizing the Outer-most Loop

Next let's break down the outer i loop. Now we're going to get to the point where readability will start to suffer, unfortunately...

i = np.arange(M)[:,np.newaxis]
mask = (B == i).astype(int)
for j in range(N):
    A[:,j] = (C[j] * mask).sum(axis=-1)

There are a couple different tricks here. In this case, we're iterating over the columns of A. Each column of A is the sum of a subset of the corresponding row of C. The subset of the row of C is determined by where B is equal to the row index i.

To get around iterating through i, we're making a 2D array where B == i by adding a new axis to i. (Have a look at the documentation for numpy broadcasting if you're confused by this.) In other words:

B:
    array([1, 1, 1, 1, 0])

i: 
    array([[0],
           [1]])

B == i:
    array([[False, False, False, False,  True],
           [ True,  True,  True,  True, False]], dtype=bool)

What we want is to take two (M) filtered sums of C[j], one for each row in B == i. This will give us a two-element vector corresponding to the j^th column in A.

We can't do this by indexing C directly because the result won't maintain it's shape, as each row may have a different number of elements. We'll get around this by multiplying the B == i mask by the current row of C, resulting in zeros where B == i is False, and the value in the current row of C where it's true.

To do this, we need to turn the boolean array B == i into integers:

mask = (B == i).astype(int):
    array([[0, 0, 0, 0, 1],
           [1, 1, 1, 1, 0]])

So when we multiply it by the current row of C:

C[j]:
    array([ 0.19844887,  0.44858679,  0.35370919,  0.84074259,  0.74513377])

C[j] * mask:
    array([[ 0.        ,  0.        ,  0.        ,  0.        ,  0.74513377],
           [ 0.19844887,  0.44858679,  0.35370919,  0.84074259,  0.        ]])

Then we can sum over each row to get the current column of A (This will be broadcast to a column when it's assigned to A[:,j]):

(C[j] * mask).sum(axis=-1):
    array([ 0.74513377,  1.84148744])

Fully Vectorized Version

Finally, breaking down the last loop, we can apply the exact same principle to add a third dimension for the loop over j:

i = np.arange(M)[:,np.newaxis,np.newaxis]
mask = (B == i).astype(int)
A = (C * mask).sum(axis=-1)

@DSM's vectorized version

As @DSM suggested, you could also do:

A = (B == np.arange(M)[:,np.newaxis]).dot(C.T)

This is by far the fastest solution for most sizes of M and N, and arguably the most elegant (much more elegant than my solutions, anyway).

Let's break it down a bit.

The B == np.arange(M)[:,np.newaxis] is exactly equivalent to B == i in the "Vectorizing the Outer-most Loop" section above.

The key is in recognizing that all of the j and k loops are equivalent to matrix multiplication. dot will cast the boolean B == i array to the same dtype as C behind-the-scenes, so we don't need to worry about explicitly casting it to a different type.

After that, we're just performing matrix multiplication on the transpose of C (a 5x5 array) and the "mask" 0 and 1 array above, yielding a 2x5 array.

dot will take advantage of any optimized BLAS libraries you have installed (e.g. ATLAS, MKL), so it's very fast.

Timings

For small M's and N's, the differences are less apparent (~6x between looping and DSM's version):

M, N = 2, 5

%timeit loops(B,C,M)
10000 loops, best of 3: 83 us per loop

%timeit k_vectorized(B,C,M)
10000 loops, best of 3: 106 us per loop

%timeit vectorized(B,C,M)
10000 loops, best of 3: 23.7 us per loop

%timeit askewchan(B,C,M)
10000 loops, best of 3: 42.7 us per loop

%timeit einsum(B,C,M)
100000 loops, best of 3: 15.2 us per loop

%timeit dsm(B,C,M)
100000 loops, best of 3: 13.9 us per loop

However, once M and N start to grow, the difference becomes very significant (~600x) (note the units!):

M, N = 50, 20

%timeit loops(B,C,M)
10 loops, best of 3: 50.3 ms per loop

%timeit k_vectorized(B,C,M)
100 loops, best of 3: 10.5 ms per loop

%timeit ik_vectorized(B,C,M)
1000 loops, best of 3: 963 us per loop

%timeit vectorized(B,C,M)
1000 loops, best of 3: 247 us per loop

%timeit askewchan(B,C,M)
1000 loops, best of 3: 493 us per loop

%timeit einsum(B,C,M)
10000 loops, best of 3: 134 us per loop

%timeit dsm(B,C,M)
10000 loops, best of 3: 80.2 us per loop

145

answered Nov 16 '22 03:11

Joe Kington

I am assuming @DSM found your typo, and you want:

A[i,j] = sum_{k in 0...N} C[j,k] where B[k] == i

Then you can loop over i in range(M) since M is relatively small.

A = np.array([C[:,B == i].sum(axis=1) for i in range(M)])

answered Nov 16 '22 01:11

askewchan

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Summing values of 2D array on indices

Tags:

python

indexing

numpy

rowsum

Kevin

People also ask

2 Answers

Nested Loop Version

Vectorizing the Inner Loop

Vectorizing the Outer-most Loop

Fully Vectorized Version

@DSM's vectorized version

Timings

Joe Kington

askewchan

Recent Activity

Donate For Us

Summing values of 2D array on indices

Tags:

python

indexing

numpy

rowsum

Kevin

People also ask

2 Answers

Nested Loop Version

Vectorizing the Inner Loop

Vectorizing the Outer-most Loop

Fully Vectorized Version

@DSM's vectorized version

Timings

Joe Kington

askewchan

Related questions

Recent Activity

Donate For Us