Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I vectorize the averaging of 2x2 sub-arrays of numpy array?

I have a very a very large 2D numpy array that contains 2x2 subsets that I need to take the average of. I am looking for a way to vectorize this operation. For example, given x:

#               |- col 0 -|   |- col 1 -|   |- col 2 -|       
x = np.array( [[ 0.0,   1.0,   2.0,   3.0,   4.0,   5.0],  # row 0
               [ 6.0,   7.0,   8.0,   9.0,  10.0,  11.0],  # row 0
               [12.0,  13.0,  14.0,  15.0,  16.0,  17.0],  # row 1
               [18.0,  19.0,  20.0,  21.0,  22.0,  23.0]]) # row 1

I need to end up with a 2x3 array which are the averages of each 2x2 sub array, i.e.:

result = np.array( [[ 3.5,  5.5,  7.5],
                    [15.5, 17.5, 19.5]])

so element [0,0] is calculated as the average of x[0:2,0:2], while element [0,1] would be the average of x[2:4, 0:2]. Does numpy have vectorized/efficient ways of doing aggregates on subsets like this?

like image 232
MarkD Avatar asked Nov 11 '14 17:11

MarkD


People also ask

How do you find the average of a 2D array in Python?

To calculate the average separately for each column of the 2D array, use the function call np. average(matrix, axis=0) setting the axis argument to 0. The resulting array has three average values, one per column of the input matrix .

How do you find the average of a NumPy array?

To find the average of a numpy array, you can use numpy. average() function. The numpy library of Python provides a function called np. average(), used for calculating the weight mean along the specified axis.

What does vectorize do in NumPy?

The concept of vectorized operations on NumPy allows the use of more optimal and pre-compiled functions and mathematical operations on NumPy array objects and data sequences. The Output and Operations will speed up when compared to simple non-vectorized operations. Example 1: Using vectorized sum method on NumPy array.


1 Answers

If we form the reshaped matrix y = x.reshape(2,2,3,2), then the (i,j) 2x2 submatrix is given by y[i,:,j,:]. E.g.:

In [340]: x
Out[340]: 
array([[  0.,   1.,   2.,   3.,   4.,   5.],
       [  6.,   7.,   8.,   9.,  10.,  11.],
       [ 12.,  13.,  14.,  15.,  16.,  17.],
       [ 18.,  19.,  20.,  21.,  22.,  23.]])

In [341]: y = x.reshape(2,2,3,2)

In [342]: y[0,:,0,:]
Out[342]: 
array([[ 0.,  1.],
       [ 6.,  7.]])

In [343]: y[1,:,2,:]
Out[343]: 
array([[ 16.,  17.],
       [ 22.,  23.]])

To get the mean of the 2x2 submatrices, use the mean method, with axis=(1,3):

In [344]: y.mean(axis=(1,3))
Out[344]: 
array([[  3.5,   5.5,   7.5],
       [ 15.5,  17.5,  19.5]])

If you are using an older version of numpy that doesn't support using a tuple for the axis, you could do:

In [345]: y.mean(axis=1).mean(axis=-1)
Out[345]: 
array([[  3.5,   5.5,   7.5],
       [ 15.5,  17.5,  19.5]])

See the link given by @dashesy in a comment for more background on the reshaping "trick".


To generalize this to a 2-d array with shape (m, n), where m and n are even, use

y = x.reshape(x.shape[0]/2, 2, x.shape[1], 2)

y can then be interpreted as an array of 2x2 arrays. The first and third index slots of the 4-d array act as the indices that select one of the 2x2 blocks. To get the upper left 2x2 block, use y[0, :, 0, :]; to the block in the second row and third column of blocks, use y[1, :, 2, :]; and in general, to acces block (j, k), use y[j, :, k, :].

To compute the reduced array of averages of these blocks, use the mean method, with axis=(1, 3) (i.e. average over axes 1 and 3):

avg = y.mean(axis=(1, 3))

Here's an example where x has shape (8, 10), so the array of averages of the 2x2 blocks has shape (4, 5):

In [10]: np.random.seed(123)

In [11]: x = np.random.randint(0, 4, size=(8, 10))

In [12]: x
Out[12]: 
array([[2, 1, 2, 2, 0, 2, 2, 1, 3, 2],
       [3, 1, 2, 1, 0, 1, 2, 3, 1, 0],
       [2, 0, 3, 1, 3, 2, 1, 0, 0, 0],
       [0, 1, 3, 3, 2, 0, 3, 2, 0, 3],
       [0, 1, 0, 3, 1, 3, 0, 0, 0, 2],
       [1, 1, 2, 2, 3, 2, 1, 0, 0, 3],
       [2, 1, 0, 3, 2, 2, 2, 2, 1, 2],
       [0, 3, 3, 3, 1, 0, 2, 0, 2, 1]])

In [13]: y = x.reshape(x.shape[0]/2, 2, x.shape[1]/2, 2)

Take a look at a couple of the 2x2 blocks:

In [14]: y[0, :, 0, :]
Out[14]: 
array([[2, 1],
       [3, 1]])

In [15]: y[1, :, 2, :]
Out[15]: 
array([[3, 2],
       [2, 0]])

Compute the averages of the blocks:

In [16]: avg = y.mean(axis=(1, 3))

In [17]: avg
Out[17]: 
array([[ 1.75,  1.75,  0.75,  2.  ,  1.5 ],
       [ 0.75,  2.5 ,  1.75,  1.5 ,  0.75],
       [ 0.75,  1.75,  2.25,  0.25,  1.25],
       [ 1.5 ,  2.25,  1.25,  1.5 ,  1.5 ]])
like image 173
Warren Weckesser Avatar answered Nov 15 '22 09:11

Warren Weckesser