I often need to do summations over certain rows or columns of a larger NumPy array. For example, take this array:
>>> c = np.arange(18).reshape(3, 6)
>>> print(c)
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]]
Suppose I want to sum only where the row index is 0 or 2, AND the column index is either 0, 2, 4, or 5. In other words, I want the sum of the subarray
[[ 0 2 4 5]
[12 14 16 17]]
I generally do this with NumPy's incredibly useful ix_
method; e.g.
>>> np.sum(c[np.ix_([0,2],[0,2,4,5])])
70
So far, so good. Now, however, suppose I have a different array, e
, that is like c
, but has two leading dimensions. So its shape is (2,3,3,6) instead of just (3,6):
e = np.arange(108).reshape(2, 3, 3, 6)
(Please note that the actual arrays I am working with might contain any random integers; they do not contain consecutive integers like this example.)
What I'm looking to do is the calculation above for each row/column combination. The following works for this simple example, but for larger arrays with more dimensions this can be really, really slow:
new_sum = np.empty((2,3))
for i in range(2):
for j in range(3):
temp_array = e[i,j,:,:]
new_sum[i,j] = np.sum(temp_array[np.ix_([0,2],[0,2,4,5])])
The question: Can the above be accomplished in a faster way, presumably without resorting to using loops?
As a footnote, the result of the above is the following:
>>> print(new_sum)
[[ 70. 214. 358.]
[502. 646. 790.]]
Of course, the 70 in the upper-left is the same result we got before.
You can create a boolean matrix (mask) that will have True
on values you want to keep and False
on those that you don't want.
>>> mask = np.zeros((3,6), dtype='bool')
>>> mask[np.ix_([0,2],[0,2,4,5])] = True
>>> mask
array([[ True, False, True, False, True, True],
[False, False, False, False, False, False],
[ True, False, True, False, True, True]])
You can then take advantage of numpy array broadcasting rules apply the mask to the array and sum over the last dimensions:
>>> new_sum = np.sum(e * mask.reshape(1,1,3,6), axis=(2,3))
>>> new_sum
array([[ 70, 214, 358],
[502, 646, 790]])
Here is a small code that compares the performances of the two versions on a bigger matrix:
import numpy as np
import time
N, P = 200, 100
e = np.arange(18*N*P).reshape(N, P, 3, 6)
t_start = time.time()
new_sum = np.empty((N,P))
for i in range(N):
for j in range(P):
temp_array = e[i,j,:,:]
new_sum[i,j] = np.sum(temp_array[np.ix_([0,2],[0,2,4,5])])
print(f'Timer 1: {time.time()-t_start}s')
t_start = time.time()
mask = np.zeros((3,6), dtype='bool')
mask[np.ix_([0,2],[0,2,4,5])] = True
new_sum_2 = np.sum(e * mask.reshape(1,1,3,6), axis=(2,3))
print(f'Timer 2: {time.time()-t_start}s')
print('Results are equal!' if np.allclose(new_sum, new_sum_2) else 'Results differ!')
Output:
% python3 script.py
Timer 1: 0.4343228340148926s
Timer 2: 0.002004384994506836s
Results are equal!
As you can see you get a significant improvement in terms of computation time.
Just a slight extension of your own idea really: give some more dimensions to np.ix_
, and then sum over the last two axes.
import numpy as np
e = np.arange(108).reshape(2, 3, 3, 6)
indices = [[0,2], [0,2,4,5]]
print(
np.sum(e[np.ix_(*[range(i) for i in e.shape[:-2]], *indices)], axis=(-2,-1))
)
This gives:
array([[ 70, 214, 358],
[502, 646, 790]])
So the arguments to np.ix_
in this case are
range(2), range(3), [0,2], [0,2,4,5]
For a bit more generality, we could also refrain from making assumptions about the number of axes used in the indices list:
np.sum(e[np.ix_(*[range(i) for i in e.shape[:-len(indices)]], *indices)],
axis=tuple(range(-len(indices),0)))
(Data type will be the same as for e
, which would have been the case in the example in the question if the same data type had been specified when calling np.empty
. I'm assuming there's no particular reason to cast to np.float
.)
A slightly faster approach (if your array is large and selected indices are small) than sum (same idea) as @alaniwi's solution:
np.einsum('ijkl->ij',e[np.ix_(np.arange(e.shape[0]),np.arange(e.shape[1]),[0,2],[0,2,4,5])])
[[ 70, 214, 358],
[502, 646, 790]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With