Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding Patterns in a Numpy Array

I am trying to find patterns in a numpy array, called values. I'd like to return the starting index position of the pattern. I know I could iterative over each element and check whether that element and the next one match the pattern, but over a large dataset that is incredibly inefficient and am looking for a better alternative.

I've got a working solution using np.where for searching for a single value, but I can't get it to work with finding a pattern or two numbers.

Example:

import numpy as np
values = np.array([0,1,2,1,2,4,5,6,1,2,1])
searchval = [1,2]
print  np.where(values == searchval)[0]

Output:

[]

Expected Output:

[1, 3, 8]
like image 238
user2242044 Avatar asked Feb 27 '17 17:02

user2242044


3 Answers

Here's a straight forward approach to using where. Start with a logical expression that finds the matches:

In [670]: values = np.array([0,1,2,1,2,4,5,6,1,2,1])
     ...: searchval = [1,2]
     ...: 
In [671]: (values[:-1]==searchval[0]) & (values[1:]==searchval[1])
Out[671]: array([False,  True, False,  True, False, False, False, False,  True, False], dtype=bool)
In [672]: np.where(_)
Out[672]: (array([1, 3, 8], dtype=int32),)

That could be generalized into a loop that operates on multiple searchval. Getting the slice range correct will take some fiddling. The roll suggested in another answer might be easier, but I suspect a bit slower.

As long as searchval is small compared to values this general approach should be efficient. There is a np.in1d that does this sort of match, but with a or test. So it isn't applicable. But it too uses this iterative approach is the searchval list is small enough.

Generalized slicing

In [716]: values
Out[716]: array([0, 1, 2, 1, 2, 4, 5, 6, 1, 2, 1])
In [717]: searchvals=[1,2,1]
In [718]: idx = [np.s_[i:m-n+1+i] for i in range(n)]
In [719]: idx
Out[719]: [slice(0, 9, None), slice(1, 10, None), slice(2, 11, None)]
In [720]: [values[idx[i]] == searchvals[i] for i in range(n)]
Out[720]: 
[array([False,  True, False,  True, False, False, False, False,  True], dtype=bool),
 array([False,  True, False,  True, False, False, False, False,  True], dtype=bool),
 array([False,  True, False, False, False, False,  True, False,  True], dtype=bool)]
In [721]: np.all(_, axis=0)
Out[721]: array([False,  True, False, False, False, False, False, False,  True], dtype=bool)
In [722]: np.where(_)
Out[722]: (array([1, 8], dtype=int32),)

I used the intermediate np.s_ to look at the slices and make sure they look reasonable.

as_strided

An advanced trick would be to use as_strided to construct the 'rolled' array and perform a 2d == test on that. as_strided is neat but tricky. To use it correctly you have to understand strides, and get the shape correct.

In [740]: m,n = len(values), len(searchvals)
In [741]: values.shape
Out[741]: (11,)
In [742]: values.strides
Out[742]: (4,)
In [743]: 
In [743]: M = as_strided(values, shape=(n,m-n+1),strides=(4,4))
In [744]: M
Out[744]: 
array([[0, 1, 2, 1, 2, 4, 5, 6, 1],
       [1, 2, 1, 2, 4, 5, 6, 1, 2],
       [2, 1, 2, 4, 5, 6, 1, 2, 1]])
In [745]: M == np.array(searchvals)[:,None]
Out[745]: 
array([[False,  True, False,  True, False, False, False, False,  True],
       [False,  True, False,  True, False, False, False, False,  True],
       [False,  True, False, False, False, False,  True, False,  True]], dtype=bool)
In [746]: np.where(np.all(_,axis=0))
Out[746]: (array([1, 8], dtype=int32),)
like image 179
hpaulj Avatar answered Oct 21 '22 06:10

hpaulj


Couldn't you simply use np.where (assuming this is the optimal way to find an element) and then only check pattens which satisfy the first condition.

import numpy as np
values = np.array([0,1,2,1,2,4,5,6,1,2,1])
searchval = [1,2]
N = len(searchval)
possibles = np.where(values == searchval[0])[0]

solns = []
for p in possibles:
    check = values[p:p+N]
    if np.all(check == searchval):
        solns.append(p)

print(solns)
like image 7
Ed Smith Avatar answered Oct 21 '22 08:10

Ed Smith


I think this does the job:

np.where((values == 1) & (np.roll(values,-1) == 2))[0]
like image 3
jnsod Avatar answered Oct 21 '22 06:10

jnsod