Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently applying function over moving window numpy array

Tags:

I have around 100,000 two dimensional arrays where I need to apply local filters. Both dimensions have an even size and the window is over a 2x2 piece and shifts 2 pieces further, so that every element is in a window once. The output is a binary two dimensional array of the same size and my filter is a binary 2x2 piece as well. The parts of my filter that are a 0 will map to a 0, the parts of my filter that is a 1 all map to a 1 if they have the same value and map to 0 if they are not all the same. Here is an example:

Filter:  0 1     Array to filter:  1 2 3 2    Output:  0 1 0 0
         1 0                       2 3 3 3             1 0 0 0

Of course I can do this using a double for loop however this is very inefficient and there has to be a better way. I read this: Vectorized moving window on 2D array in numpy however I am uncertain how I would apply that to my case.

like image 820
Jan van der Vegt Avatar asked May 29 '16 19:05

Jan van der Vegt


People also ask

Is appending to NumPy array efficient?

Appending to numpy arrays is very inefficient. This is because the interpreter needs to find and assign memory for the entire array at every single step. Depending on the application, there are much better strategies. If you know the length in advance, it is best to pre-allocate the array using a function like np.

Why are NumPy functions faster?

NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.

Why do NumPy array operations have better performance compared to Python functions and loops?

NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.


1 Answers

You can split each 2x2 subarray and then reshape such that each windowed block becomes a row in a 2D array. From each row, extract out the elements corresponding to f==1 positions using boolean indexing. Then, look to see if all extracted elements are identical along each row, to give us a mask. Use this mask to multiply with f for the final binary output after reshaping.

Thus, assuming f as the filter array and A as the data array, a vectorized implementation to follow such steps would look like this -

# Setup size parameters
M = A.shape[0]
Mh = M/2
N = A.shape[1]/2

# Reshape input array to 4D such that the last two axes represent the 
# windowed block at each iteration of the intended operation      
A4D = A.reshape(-1,2,N,2).swapaxes(1,2)

# Determine the binary array whether all elements mapped against 1 
# in the filter array are the same elements or not
S = (np.diff(A4D.reshape(-1,4)[:,f.ravel()==1],1)==0).all(1)

# Finally multiply the binary array with f to get desired binary output
out = (S.reshape(Mh,N)[:,None,:,None]*f[:,None,:]).reshape(M,-1)

Sample run -

1) Inputs :

In [58]: A
Out[58]: 
array([[1, 1, 1, 1, 2, 1],
       [1, 1, 3, 1, 2, 2],
       [1, 3, 3, 3, 2, 3],
       [3, 3, 3, 3, 3, 1]])

In [59]: f
Out[59]: 
array([[0, 1],
       [1, 1]])

2) Intermediate outputs :

In [60]: A4D
Out[60]: 
array([[[[1, 1],
         [1, 1]],

        [[1, 1],
         [3, 1]],

        [[2, 1],
         [2, 2]]],


       [[[1, 3],
         [3, 3]],

        [[3, 3],
         [3, 3]],

        [[2, 3],
         [3, 1]]]])

In [61]: S
Out[61]: array([ True, False, False,  True,  True, False], dtype=bool)

3) Final output :

In [62]: out
Out[62]: 
array([[0, 1, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0],
       [0, 1, 0, 1, 0, 0],
       [1, 1, 1, 1, 0, 0]])
like image 63
Divakar Avatar answered Sep 28 '22 02:09

Divakar