Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summing values of 2D array on indices

I need to extend this question, which sums values of an array based on indices from a second array. Let A be the result array, B be the index array, and C the array to be summed over. Then A[i] = sum over C such that index(B) == i.

Instead, my setup is

N = 5
M = 2

A = np.zeros((M,N))
B = np.random.randint(M, size=N) # contains indices for A
C = np.random.rand(N,N)

I need A[i,j] = sum_{k in 0...N} C[j,k] such that C[k] == i , i.e. a rowsum conditional on the indices of B matching i. Is there an efficient way to do this? For my application N is around 10,000 and M is around 20. This operation is called for every iteration in a minimization problem... my current looping method is terribly slow.

Thanks!

like image 312
Kevin Avatar asked Apr 07 '13 16:04

Kevin


People also ask

How do you sum up a 2D array?

Let's see how to find the sum of the 2D array using the map function. Initialize the 2D array using lists. Pass the function sum and 2D array to the map function. Find the sum of resultant map object and print it.

How are 2D arrays indexed?

Two-dimensional (2D) arrays are indexed by two subscripts, one for the row and one for the column. Each element in the 2D array must by the same type, either a primitive type or object type.

How do you sum a 2D NumPy array?

Method 2: Using the sum() function in NumPy, numpy. sum(arr, axis, dtype, out) function returns the sum of array elements over the specified axis. To compute the sum of all columns the axis argument should be 0 in sum() function.

How do you index a multidimensional array?

Indexing multi-dimensional arrays Multi-dimensional arrays are indexed in GAUSS the same way that matrices are indexed, using square brackets [] . Scanning above, you can see that the value of the element at the intersection of the third row and second column of x1 is 8.


2 Answers

Following @DSM's comment, I'm assuming your C[k] == i supposed to be B[k] == i. If that's the case, does your loop version look something like this?

Nested Loop Version

import numpy as np

N = 5
M = 2

A = np.zeros((M,N))
B = np.random.randint(M, size=N) # contains indices for A
C = np.random.rand(N,N)

for i in range(M):
    for j in range(N):
        for k in range(N):
            if B[k] == i:
                A[i,j] += C[j,k]

There's more than one way to vectorize this problem. I'm going to show my thought process below, but there are more efficient ways to do it (e.g. @DSM's version that recognizes the matrix multiplication inherent in the problem).

For the sake of explanation, here's a walk-through of one approach.


Vectorizing the Inner Loop

Let's start by re-writing the inner k loop:

for i in range(M):
    for j in range(N):
        A[i,j] = C[j, B == i].sum()

It might be easier to think of this as C[j][B == i].sum(). We're just selecting jth row of C, selecting only the elements in that row where B is equal to i, and summing them.


Vectorizing the Outer-most Loop

Next let's break down the outer i loop. Now we're going to get to the point where readability will start to suffer, unfortunately...

i = np.arange(M)[:,np.newaxis]
mask = (B == i).astype(int)
for j in range(N):
    A[:,j] = (C[j] * mask).sum(axis=-1)

There are a couple different tricks here. In this case, we're iterating over the columns of A. Each column of A is the sum of a subset of the corresponding row of C. The subset of the row of C is determined by where B is equal to the row index i.

To get around iterating through i, we're making a 2D array where B == i by adding a new axis to i. (Have a look at the documentation for numpy broadcasting if you're confused by this.) In other words:

B:
    array([1, 1, 1, 1, 0])

i: 
    array([[0],
           [1]])

B == i:
    array([[False, False, False, False,  True],
           [ True,  True,  True,  True, False]], dtype=bool)

What we want is to take two (M) filtered sums of C[j], one for each row in B == i. This will give us a two-element vector corresponding to the jth column in A.

We can't do this by indexing C directly because the result won't maintain it's shape, as each row may have a different number of elements. We'll get around this by multiplying the B == i mask by the current row of C, resulting in zeros where B == i is False, and the value in the current row of C where it's true.

To do this, we need to turn the boolean array B == i into integers:

mask = (B == i).astype(int):
    array([[0, 0, 0, 0, 1],
           [1, 1, 1, 1, 0]])

So when we multiply it by the current row of C:

C[j]:
    array([ 0.19844887,  0.44858679,  0.35370919,  0.84074259,  0.74513377])

C[j] * mask:
    array([[ 0.        ,  0.        ,  0.        ,  0.        ,  0.74513377],
           [ 0.19844887,  0.44858679,  0.35370919,  0.84074259,  0.        ]])

Then we can sum over each row to get the current column of A (This will be broadcast to a column when it's assigned to A[:,j]):

(C[j] * mask).sum(axis=-1):
    array([ 0.74513377,  1.84148744])

Fully Vectorized Version

Finally, breaking down the last loop, we can apply the exact same principle to add a third dimension for the loop over j:

i = np.arange(M)[:,np.newaxis,np.newaxis]
mask = (B == i).astype(int)
A = (C * mask).sum(axis=-1)

@DSM's vectorized version

As @DSM suggested, you could also do:

A = (B == np.arange(M)[:,np.newaxis]).dot(C.T)

This is by far the fastest solution for most sizes of M and N, and arguably the most elegant (much more elegant than my solutions, anyway).

Let's break it down a bit.

The B == np.arange(M)[:,np.newaxis] is exactly equivalent to B == i in the "Vectorizing the Outer-most Loop" section above.

The key is in recognizing that all of the j and k loops are equivalent to matrix multiplication. dot will cast the boolean B == i array to the same dtype as C behind-the-scenes, so we don't need to worry about explicitly casting it to a different type.

After that, we're just performing matrix multiplication on the transpose of C (a 5x5 array) and the "mask" 0 and 1 array above, yielding a 2x5 array.

dot will take advantage of any optimized BLAS libraries you have installed (e.g. ATLAS, MKL), so it's very fast.


Timings

For small M's and N's, the differences are less apparent (~6x between looping and DSM's version):

M, N = 2, 5

%timeit loops(B,C,M)
10000 loops, best of 3: 83 us per loop

%timeit k_vectorized(B,C,M)
10000 loops, best of 3: 106 us per loop

%timeit vectorized(B,C,M)
10000 loops, best of 3: 23.7 us per loop

%timeit askewchan(B,C,M)
10000 loops, best of 3: 42.7 us per loop

%timeit einsum(B,C,M)
100000 loops, best of 3: 15.2 us per loop

%timeit dsm(B,C,M)
100000 loops, best of 3: 13.9 us per loop

However, once M and N start to grow, the difference becomes very significant (~600x) (note the units!):

M, N = 50, 20

%timeit loops(B,C,M)
10 loops, best of 3: 50.3 ms per loop

%timeit k_vectorized(B,C,M)
100 loops, best of 3: 10.5 ms per loop

%timeit ik_vectorized(B,C,M)
1000 loops, best of 3: 963 us per loop

%timeit vectorized(B,C,M)
1000 loops, best of 3: 247 us per loop

%timeit askewchan(B,C,M)
1000 loops, best of 3: 493 us per loop

%timeit einsum(B,C,M)
10000 loops, best of 3: 134 us per loop

%timeit dsm(B,C,M)
10000 loops, best of 3: 80.2 us per loop
like image 145
Joe Kington Avatar answered Nov 16 '22 03:11

Joe Kington


I am assuming @DSM found your typo, and you want:

A[i,j] = sum_{k in 0...N} C[j,k] where B[k] == i

Then you can loop over i in range(M) since M is relatively small.

A = np.array([C[:,B == i].sum(axis=1) for i in range(M)])
like image 29
askewchan Avatar answered Nov 16 '22 01:11

askewchan