I need to extend this question, which sums values of an array based on indices from a second array. Let A
be the result array, B
be the index array, and C
the array to be summed over. Then A[i] = sum
over C
such that index(B) == i
.
Instead, my setup is
N = 5
M = 2
A = np.zeros((M,N))
B = np.random.randint(M, size=N) # contains indices for A
C = np.random.rand(N,N)
I need A[i,j] = sum_{k in 0...N} C[j,k]
such that C[k] == i
, i.e. a rowsum conditional on the indices of B matching i. Is there an efficient way to do this? For my application N is around 10,000 and M is around 20. This operation is called for every iteration in a minimization problem... my current looping method is terribly slow.
Thanks!
Let's see how to find the sum of the 2D array using the map function. Initialize the 2D array using lists. Pass the function sum and 2D array to the map function. Find the sum of resultant map object and print it.
Two-dimensional (2D) arrays are indexed by two subscripts, one for the row and one for the column. Each element in the 2D array must by the same type, either a primitive type or object type.
Method 2: Using the sum() function in NumPy, numpy. sum(arr, axis, dtype, out) function returns the sum of array elements over the specified axis. To compute the sum of all columns the axis argument should be 0 in sum() function.
Indexing multi-dimensional arrays Multi-dimensional arrays are indexed in GAUSS the same way that matrices are indexed, using square brackets [] . Scanning above, you can see that the value of the element at the intersection of the third row and second column of x1 is 8.
Following @DSM's comment, I'm assuming your C[k] == i
supposed to be B[k] == i
. If that's the case, does your loop version look something like this?
import numpy as np
N = 5
M = 2
A = np.zeros((M,N))
B = np.random.randint(M, size=N) # contains indices for A
C = np.random.rand(N,N)
for i in range(M):
for j in range(N):
for k in range(N):
if B[k] == i:
A[i,j] += C[j,k]
There's more than one way to vectorize this problem. I'm going to show my thought process below, but there are more efficient ways to do it (e.g. @DSM's version that recognizes the matrix multiplication inherent in the problem).
For the sake of explanation, here's a walk-through of one approach.
Let's start by re-writing the inner k
loop:
for i in range(M):
for j in range(N):
A[i,j] = C[j, B == i].sum()
It might be easier to think of this as C[j][B == i].sum()
. We're just selecting j
th row of C
, selecting only the elements in that row where B
is equal to i
, and summing them.
Next let's break down the outer i
loop. Now we're going to get to the point where readability will start to suffer, unfortunately...
i = np.arange(M)[:,np.newaxis]
mask = (B == i).astype(int)
for j in range(N):
A[:,j] = (C[j] * mask).sum(axis=-1)
There are a couple different tricks here. In this case, we're iterating over the columns of A
. Each column of A
is the sum of a subset of the corresponding row of C
. The subset of the row of C
is determined by where B
is equal to the row index i
.
To get around iterating through i
, we're making a 2D array where B == i
by adding a new axis to i. (Have a look at the documentation for numpy broadcasting if you're confused by this.) In other words:
B:
array([1, 1, 1, 1, 0])
i:
array([[0],
[1]])
B == i:
array([[False, False, False, False, True],
[ True, True, True, True, False]], dtype=bool)
What we want is to take two (M
) filtered sums of C[j]
, one for each row in B == i
. This will give us a two-element vector corresponding to the j
th column in A
.
We can't do this by indexing C
directly because the result won't maintain it's shape, as each row may have a different number of elements. We'll get around this by multiplying the B == i
mask by the current row of C
, resulting in zeros where B == i
is False
, and the value in the current row of C
where it's true.
To do this, we need to turn the boolean array B == i
into integers:
mask = (B == i).astype(int):
array([[0, 0, 0, 0, 1],
[1, 1, 1, 1, 0]])
So when we multiply it by the current row of C
:
C[j]:
array([ 0.19844887, 0.44858679, 0.35370919, 0.84074259, 0.74513377])
C[j] * mask:
array([[ 0. , 0. , 0. , 0. , 0.74513377],
[ 0.19844887, 0.44858679, 0.35370919, 0.84074259, 0. ]])
Then we can sum over each row to get the current column of A
(This will be broadcast to a column when it's assigned to A[:,j]
):
(C[j] * mask).sum(axis=-1):
array([ 0.74513377, 1.84148744])
Finally, breaking down the last loop, we can apply the exact same principle to add a third dimension for the loop over j
:
i = np.arange(M)[:,np.newaxis,np.newaxis]
mask = (B == i).astype(int)
A = (C * mask).sum(axis=-1)
As @DSM suggested, you could also do:
A = (B == np.arange(M)[:,np.newaxis]).dot(C.T)
This is by far the fastest solution for most sizes of M
and N
, and arguably the most elegant (much more elegant than my solutions, anyway).
Let's break it down a bit.
The B == np.arange(M)[:,np.newaxis]
is exactly equivalent to B == i
in the "Vectorizing the Outer-most Loop" section above.
The key is in recognizing that all of the j
and k
loops are equivalent to matrix multiplication. dot
will cast the boolean B == i
array to the same dtype as C
behind-the-scenes, so we don't need to worry about explicitly casting it to a different type.
After that, we're just performing matrix multiplication on the transpose of C
(a 5x5 array) and the "mask" 0 and 1 array above, yielding a 2x5 array.
dot
will take advantage of any optimized BLAS libraries you have installed (e.g. ATLAS
, MKL
), so it's very fast.
For small M
's and N
's, the differences are less apparent (~6x between looping and DSM's version):
M, N = 2, 5
%timeit loops(B,C,M)
10000 loops, best of 3: 83 us per loop
%timeit k_vectorized(B,C,M)
10000 loops, best of 3: 106 us per loop
%timeit vectorized(B,C,M)
10000 loops, best of 3: 23.7 us per loop
%timeit askewchan(B,C,M)
10000 loops, best of 3: 42.7 us per loop
%timeit einsum(B,C,M)
100000 loops, best of 3: 15.2 us per loop
%timeit dsm(B,C,M)
100000 loops, best of 3: 13.9 us per loop
However, once M
and N
start to grow, the difference becomes very significant (~600x) (note the units!):
M, N = 50, 20
%timeit loops(B,C,M)
10 loops, best of 3: 50.3 ms per loop
%timeit k_vectorized(B,C,M)
100 loops, best of 3: 10.5 ms per loop
%timeit ik_vectorized(B,C,M)
1000 loops, best of 3: 963 us per loop
%timeit vectorized(B,C,M)
1000 loops, best of 3: 247 us per loop
%timeit askewchan(B,C,M)
1000 loops, best of 3: 493 us per loop
%timeit einsum(B,C,M)
10000 loops, best of 3: 134 us per loop
%timeit dsm(B,C,M)
10000 loops, best of 3: 80.2 us per loop
I am assuming @DSM found your typo, and you want:
A[i,j] = sum_{k in 0...N} C[j,k] where B[k] == i
Then you can loop over i in range(M)
since M
is relatively small.
A = np.array([C[:,B == i].sum(axis=1) for i in range(M)])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With