I have an array and I would like to produce a smaller array by scanning a 2x2 non-overlappingly windows and getting the maximum. Here is an example:
import numpy as np
np.random.seed(123)
np.set_printoptions(linewidth=1000,precision=3)
arr = np.random.uniform(-1,1,(4,4))
res = np.zeros((2,2))
for i in xrange(res.shape[0]):
for j in xrange(res.shape[1]):
ii = i*2
jj = j*2
res[i][j] = max(arr[ii][jj],arr[ii+1][jj],arr[ii][jj+1],arr[ii+1][jj+1])
print arr
print res
So a matrix like this:
[[ 0.393 -0.428 -0.546 0.103]
[ 0.439 -0.154 0.962 0.37 ]
[-0.038 -0.216 -0.314 0.458]
[-0.123 -0.881 -0.204 0.476]]
Should become this:
[[ 0.439 0.962]
[-0.038 0.476]]
How can I do this more efficiently?
numpy.maximum() function is used to find the element-wise maximum of array elements. It compares two arrays and returns a new array containing the element-wise maxima. If one of the elements being compared is a NaN, then that element is returned.
Having said that, numpy.max (and most of the other NumPy functions) will operate on any “array like sequence” of data. That means that the argument to the a parameter can be a Python list, a Python tuple, or one of several other Python sequences.
If you use numpy.max on this 2-d array (without the axis parameter), then the output will be a single number, a scalar. Scalars have zero dimensions. Two dimensions in, zero dimension out. The NumPy max function effectively reduces the dimensions between the input and the output.
Keep in mind that the maximum value is the same: 8. It’s just that the dimensions of the output is different. By setting keepdims = True, we change the structure of the output … instead of being a scalar, the output is actually a 2-d NumPy array with a single value ( 8 ).
You can do this:
print arr.reshape(2,2,2,2).swapaxes(1,2).reshape(2,2,4).max(axis=-1)
[[ 0.439 0.962]
[-0.038 0.476]]
To explain starting with:
arr=np.array([[0.393,-0.428,-0.546,0.103],
[0.439,-0.154,0.962,0.37,],
[-0.038,-0.216,-0.314,0.458],
[-0.123,-0.881,-0.204,0.476]])
We first want to group the axes into relevant sections.
tmp = arr.reshape(2,2,2,2).swapaxes(1,2)
print tmp
[[[[ 0.393 -0.428]
[ 0.439 -0.154]]
[[-0.546 0.103]
[ 0.962 0.37 ]]]
[[[-0.038 -0.216]
[-0.123 -0.881]]
[[-0.314 0.458]
[-0.204 0.476]]]]
Reshape once more to obtain the groups of data we want:
tmp = tmp.reshape(2,2,4)
print tmp
[[[ 0.393 -0.428 0.439 -0.154]
[-0.546 0.103 0.962 0.37 ]]
[[-0.038 -0.216 -0.123 -0.881]
[-0.314 0.458 -0.204 0.476]]]
Finally take the max along the last axis.
This can be generalized, for square matrices, to:
k = arr.shape[0]/2
arr.reshape(k,2,k,2).swapaxes(1,2).reshape(k,k,4).max(axis=-1)
Following the comments of Jamie and Dougal we can generalize this further:
n = 2 #Height of window
m = 2 #Width of window
k = arr.shape[0] / n #Must divide evenly
l = arr.shape[1] / m #Must divide evenly
arr.reshape(k,n,l,m).max(axis=(-1,-3)) #Numpy >= 1.7.1
arr.reshape(k,n,l,m).max(axis=-3).max(axis=-1) #Numpy < 1.7.1
As I mentioned in the comment area, consider using NumBa. You could leave your double loop just as it is, add about 10 characters in a decorator, and get C-like performance for this. Easy to use out-of-the-box if you work with Continuum Analytics' "Anaconda" distribution of Python.
This is almost a perfect use case for NumBa because this algorithm is much more naturally expressed with the double loop. The reshaping approach exploits fast array operations, but it's extremely unreadable unless you already know the goal of the program. It's highly desirable to leave functions like this in the expanded form and achieve speed by letting something else convert to a low-level language after the fact.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With