How can I check if one two-dimensional NumPy array contains a specific pattern of values inside it?

Tags:

I have a large NumPy.array field_array and a smaller array match_array, both consisting of int values. Using the following example, how can I check if any match_array-shaped segment of field_array contains values that exactly correspond to the ones in match_array?

import numpy
raw_field = ( 24,  25,  26,  27,  28,  29,  30,  31,  23, \
              33,  34,  35,  36,  37,  38,  39,  40,  32, \
             -39, -38, -37, -36, -35, -34, -33, -32, -40, \
             -30, -29, -28, -27, -26, -25, -24, -23, -31, \
             -21, -20, -19, -18, -17, -16, -15, -14, -22, \
             -12, -11, -10,  -9,  -8,  -7,  -6,  -5, -13, \
              -3,  -2,  -1,   0,   1,   2,   3,   4,  -4, \
               6,   7,   8,   4,   5,   6,   7,  13,   5, \
              15,  16,  17,   8,   9,  10,  11,  22,  14)
field_array = numpy.array(raw_field, int).reshape(9,9)
match_array = numpy.arange(12).reshape(3,4)

These examples ought to return True since the pattern described by match_array aligns over [6:9,3:7].

750

asked Sep 11 '15 20:09

Augusta

2 Answers

Approach #1

This approach derives from a solution to Implement Matlab's im2col 'sliding' in python that was designed to rearrange sliding blocks from a 2D array into columns. Thus, to solve our case here, those sliding blocks from field_array could be stacked as columns and compared against column vector version of match_array.

Here's a formal definition of the function for the rearrangement/stacking -

def im2col(A,BLKSZ):   

    # Parameters
    M,N = A.shape
    col_extent = N - BLKSZ[1] + 1
    row_extent = M - BLKSZ[0] + 1

    # Get Starting block indices
    start_idx = np.arange(BLKSZ[0])[:,None]*N + np.arange(BLKSZ[1])

    # Get offsetted indices across the height and width of input array
    offset_idx = np.arange(row_extent)[:,None]*N + np.arange(col_extent)

    # Get all actual indices & index into input array for final output
    return np.take (A,start_idx.ravel()[:,None] + offset_idx.ravel())

To solve our case, here's the implementation based on im2col -

# Get sliding blocks of shape same as match_array from field_array into columns
# Then, compare them with a column vector version of match array.
col_match = im2col(field_array,match_array.shape) == match_array.ravel()[:,None]

# Shape of output array that has field_array compared against a sliding match_array
out_shape = np.asarray(field_array.shape) - np.asarray(match_array.shape) + 1

# Now, see if all elements in a column are ONES and reshape to out_shape. 
# Finally, find the position of TRUE indices
R,C = np.where(col_match.all(0).reshape(out_shape))

The output for the given sample in the question would be -

In [151]: R,C
Out[151]: (array([6]), array([3]))

Approach #2

Given that opencv already has template matching function that does square of differences, you can employ that and look for zero differences, which would be your matching positions. So, if you have access to cv2 (opencv module), the implementation would look something like this -

import cv2
from cv2 import matchTemplate as cv2m

M = cv2m(field_array.astype('uint8'),match_array.astype('uint8'),cv2.TM_SQDIFF)
R,C = np.where(M==0)

giving us -

In [204]: R,C
Out[204]: (array([6]), array([3]))

Benchmarking

This section compares runtimes for all the approaches suggested to solve the question. The credit for the various methods listed in this section goes to their contributors.

Method definitions -

def seek_array(search_in, search_for, return_coords = False):
    si_x, si_y = search_in.shape
    sf_x, sf_y = search_for.shape
    for y in xrange(si_y-sf_y+1):
        for x in xrange(si_x-sf_x+1):
            if numpy.array_equal(search_for, search_in[x:x+sf_x, y:y+sf_y]):
                return (x,y) if return_coords else True
    return None if return_coords else False

def skimage_based(field_array,match_array):
    windows = view_as_windows(field_array, match_array.shape)
    return (windows == match_array).all(axis=(2,3)).nonzero()

def im2col_based(field_array,match_array):   
    col_match = im2col(field_array,match_array.shape)==match_array.ravel()[:,None]
    out_shape = np.asarray(field_array.shape) - np.asarray(match_array.shape) + 1  
    return np.where(col_match.all(0).reshape(out_shape))

def cv2_based(field_array,match_array):
    M = cv2m(field_array.astype('uint8'),match_array.astype('uint8'),cv2.TM_SQDIFF)
    return np.where(M==0)

Runtime tests -

Case # 1 (Sample data from question):

In [11]: field_array
Out[11]: 
array([[ 24,  25,  26,  27,  28,  29,  30,  31,  23],
       [ 33,  34,  35,  36,  37,  38,  39,  40,  32],
       [-39, -38, -37, -36, -35, -34, -33, -32, -40],
       [-30, -29, -28, -27, -26, -25, -24, -23, -31],
       [-21, -20, -19, -18, -17, -16, -15, -14, -22],
       [-12, -11, -10,  -9,  -8,  -7,  -6,  -5, -13],
       [ -3,  -2,  -1,   0,   1,   2,   3,   4,  -4],
       [  6,   7,   8,   4,   5,   6,   7,  13,   5],
       [ 15,  16,  17,   8,   9,  10,  11,  22,  14]])

In [12]: match_array
Out[12]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [13]: %timeit seek_array(field_array, match_array, return_coords = False)
1000 loops, best of 3: 465 µs per loop

In [14]: %timeit skimage_based(field_array,match_array)
10000 loops, best of 3: 97.9 µs per loop

In [15]: %timeit im2col_based(field_array,match_array)
10000 loops, best of 3: 74.3 µs per loop

In [16]: %timeit cv2_based(field_array,match_array)
10000 loops, best of 3: 30 µs per loop

Case #2 (Bigger random data):

In [17]: field_array = np.random.randint(0,4,(256,256))

In [18]: match_array = field_array[100:116,100:116].copy()

In [19]: %timeit seek_array(field_array, match_array, return_coords = False)
1 loops, best of 3: 400 ms per loop

In [20]: %timeit skimage_based(field_array,match_array)
10 loops, best of 3: 54.3 ms per loop

In [21]: %timeit im2col_based(field_array,match_array)
10 loops, best of 3: 125 ms per loop

In [22]: %timeit cv2_based(field_array,match_array)
100 loops, best of 3: 4.08 ms per loop

answered Oct 23 '22 23:10

Divakar

There's no such search function built in to NumPy, but it is certainly possible to do in NumPy

As long as your arrays are not too massive*, you could use a rolling window approach:

from skimage.util import view_as_windows

windows = view_as_windows(field_array, match_array.shape)

The function view_as_windows is written purely in NumPy so if you don't have skimage you can always copy the code from here.

Then to see if the sub-array appears in the larger array, you can write:

>>> (windows == match_array).all(axis=(2,3)).any()
True

To find the indices of where the top-left corner of the sub-array matches, you can write:

>>> (windows == match_array).all(axis=(2,3)).nonzero()
(array([6]), array([3]))

This approach should also work for arrays of higher dimensions.

*although the array windows takes up no additional memory (only the strides and shape are changed to create a new view of the data), writing windows == match_array creates a boolean array of size (7, 6, 3, 4) which is 504 bytes of memory. If you're working with very large arrays, this approach might not be feasible.

answered Oct 23 '22 23:10

Alex Riley

Related questions
                            
                                Python: lists and copy of them
                            
                                How can I read the contents of all the files in a directory with pandas?
                            
                                Python: Random number generator with mean and Standard Deviation
                            
                                SQLAlchemy: Convert column value back and forth between internal and database format
                            
                                How can I generate a colormap array from a simple array in matplotlib
                            
                                Python Numpy mask NaN not working
                            
                                Python for loop decrementing index
                            
                                Count occurrences of item in JSON element
                            
                                Convert Content-Type header into file extension
                            
                                Python fails to open 11gb csv in r+ mode but opens in r mode
                            
                                PEP 8 and list comprehension
                            
                                How can I export an instance and all its related objects in Django?
                            
                                Pass Python list to embedded Rust function
                            
                                Alternative to `any` that returns the last evaluated object?
                            
                                Selenium pdf automatic download not working
                            
                                Flask SqlAlchemy join two models without foreign key MYSQL
                            
                                How to estimate density function and calculate its peaks?
                            
                                Python watchdog windows wait till copy finishes
                            
                                Get the length of reversed list
                            
                                Assign line colors in pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I check if one two-dimensional NumPy array contains a specific pattern of values inside it?

Tags:

python

arrays

pattern-matching

numpy