I am trying to find patterns in a numpy
array
, called values
. I'd like to return the starting index position of the pattern. I know I could iterative over each element and check whether that element and the next one match the pattern, but over a large dataset that is incredibly inefficient and am looking for a better alternative.
I've got a working solution using np.where
for searching for a single value, but I can't get it to work with finding a pattern or two numbers.
Example:
import numpy as np
values = np.array([0,1,2,1,2,4,5,6,1,2,1])
searchval = [1,2]
print np.where(values == searchval)[0]
Output:
[]
Expected Output:
[1, 3, 8]
Here's a straight forward approach to using where. Start with a logical expression that finds the matches:
In [670]: values = np.array([0,1,2,1,2,4,5,6,1,2,1])
...: searchval = [1,2]
...:
In [671]: (values[:-1]==searchval[0]) & (values[1:]==searchval[1])
Out[671]: array([False, True, False, True, False, False, False, False, True, False], dtype=bool)
In [672]: np.where(_)
Out[672]: (array([1, 3, 8], dtype=int32),)
That could be generalized into a loop that operates on multiple searchval
. Getting the slice range correct will take some fiddling. The roll
suggested in another answer might be easier, but I suspect a bit slower.
As long as searchval
is small compared to values
this general approach should be efficient. There is a np.in1d
that does this sort of match, but with a or
test. So it isn't applicable. But it too uses this iterative approach is the searchval
list is small enough.
In [716]: values
Out[716]: array([0, 1, 2, 1, 2, 4, 5, 6, 1, 2, 1])
In [717]: searchvals=[1,2,1]
In [718]: idx = [np.s_[i:m-n+1+i] for i in range(n)]
In [719]: idx
Out[719]: [slice(0, 9, None), slice(1, 10, None), slice(2, 11, None)]
In [720]: [values[idx[i]] == searchvals[i] for i in range(n)]
Out[720]:
[array([False, True, False, True, False, False, False, False, True], dtype=bool),
array([False, True, False, True, False, False, False, False, True], dtype=bool),
array([False, True, False, False, False, False, True, False, True], dtype=bool)]
In [721]: np.all(_, axis=0)
Out[721]: array([False, True, False, False, False, False, False, False, True], dtype=bool)
In [722]: np.where(_)
Out[722]: (array([1, 8], dtype=int32),)
I used the intermediate np.s_
to look at the slices and make sure they look reasonable.
An advanced trick would be to use as_strided
to construct the 'rolled' array and perform a 2d ==
test on that. as_strided
is neat but tricky. To use it correctly you have to understand strides, and get the shape correct.
In [740]: m,n = len(values), len(searchvals)
In [741]: values.shape
Out[741]: (11,)
In [742]: values.strides
Out[742]: (4,)
In [743]:
In [743]: M = as_strided(values, shape=(n,m-n+1),strides=(4,4))
In [744]: M
Out[744]:
array([[0, 1, 2, 1, 2, 4, 5, 6, 1],
[1, 2, 1, 2, 4, 5, 6, 1, 2],
[2, 1, 2, 4, 5, 6, 1, 2, 1]])
In [745]: M == np.array(searchvals)[:,None]
Out[745]:
array([[False, True, False, True, False, False, False, False, True],
[False, True, False, True, False, False, False, False, True],
[False, True, False, False, False, False, True, False, True]], dtype=bool)
In [746]: np.where(np.all(_,axis=0))
Out[746]: (array([1, 8], dtype=int32),)
Couldn't you simply use np.where
(assuming this is the optimal way to find an element) and then only check pattens which satisfy the first condition.
import numpy as np
values = np.array([0,1,2,1,2,4,5,6,1,2,1])
searchval = [1,2]
N = len(searchval)
possibles = np.where(values == searchval[0])[0]
solns = []
for p in possibles:
check = values[p:p+N]
if np.all(check == searchval):
solns.append(p)
print(solns)
I think this does the job:
np.where((values == 1) & (np.roll(values,-1) == 2))[0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With