Search for a pattern in numpy array

Question

Is there a simple way to find all relevant elements in NumPy array according to some pattern?

For example, consider the following array:

a = array(['zzzz', 'zzzd', 'zzdd', 'zddd', 'dddn', 'ddnz', 'dnzn', 'nznz',
       'znzn', 'nznd', 'zndd', 'nddd', 'ddnn', 'dnnn', 'nnnz', 'nnzn',
       'nznn', 'znnn', 'nnnn', 'nnnd', 'nndd', 'dddz', 'ddzn', 'dznn',
       'znnz', 'nnzz', 'nzzz', 'zzzn', 'zznn', 'dddd', 'dnnd'], dtype=object)

And I need to to find all combinations which contain '**dd'.

I basically need a function, which receives the array as input and returns a smaller array with all relevant elements:

>> b = func(a, pattern='**dd')
>> b = array(['zzdd', 'zddd', 'zndd', 'nddd', 'nndd', 'dddd'], dtype=object)

DSM · Accepted Answer

Since it turns out you're actually working with pandas, there are simpler ways to do it at the Series level instead of just an ndarray, using the vectorized string operations:

In [32]: s = pd.Series(['zzzz', 'zzzd', 'zzdd', 'zddd', 'dddn', 'ddnz', 'dnzn', 'nznz',
    ...:        'znzn', 'nznd', 'zndd', 'nddd', 'ddnn', 'dnnn', 'nnnz', 'nnzn',
    ...:        'nznn', 'znnn', 'nnnn', 'nnnd', 'nndd', 'dddz', 'ddzn', 'dznn',
    ...:        'znnz', 'nnzz', 'nzzz', 'zzzn', 'zznn', 'dddd', 'dnnd'])

In [33]: s[s.str.endswith("dd")]
Out[33]: 
2     zzdd
3     zddd
10    zndd
11    nddd
20    nndd
29    dddd
dtype: object

which produces a Series, or if you really insist on an ndarray:

In [34]: s[s.str.endswith("dd")].values
Out[34]: array(['zzdd', 'zddd', 'zndd', 'nddd', 'nndd', 'dddd'], dtype=object)

You can also use regular expressions, if you prefer:

In [49]: s[s.str.match(".*dd$")]
Out[49]: 
2     zzdd
3     zddd
10    zndd
11    nddd
20    nndd
29    dddd
dtype: object

Divakar · Answer

Here's an approach using numpy.core.defchararray.rfind to get us the last index of a match and then we check if that index is 2 minus the length of each string. Now, the length of each string is 4 here, so we would look for the last indices that are 4 - 2 = 2.

Thus, an implementation would be -

a[np.core.defchararray.rfind(a.astype(str),'dd')==2]

If the strings are not of equal lengths, we need to get the lengths, subtract 2 and then compare -

len_sub = np.array(list(map(len,a)))-len('dd')
a[np.core.defchararray.rfind(a.astype(str),'dd')==len_sub]

To test this out, let's add a longer string ending with dd at the end of the given sample -

In [121]: a = np.append(a,'ewqjejwqjedd')

In [122]: len_sub = np.array(list(map(len,a)))-len('dd')

In [123]: a[np.core.defchararray.rfind(a.astype(str),'dd')==len_sub]
Out[123]: array(['zzdd', 'zddd', 'zndd', 'nddd', 'nndd', 'dddd',\
                 'ewqjejwqjedd'], dtype=object)

Search for a pattern in numpy array

Tags:

python

numpy

Arnold Klein

2 Answers

DSM

Divakar

Recent Activity

Donate For Us

Search for a pattern in numpy array

Tags:

python

numpy

Arnold Klein

2 Answers

DSM

Divakar

Related questions

Recent Activity

Donate For Us