I tried to find entries in an Array containing a substring with np.where and an in condition:
import numpy as np foo = "aa" bar = np.array(["aaa", "aab", "aca"]) np.where(foo in bar)
this only returns an empty Array.
Why is that so?
And is there a good alternative solution?
The elements of a NumPy array, or simply an array, are usually numbers, but can also be boolians, strings, or other objects.
Using Numpy array, we can easily find whether specific values are present or not. For this purpose, we use the “in” operator. “in” operator is used to check whether certain element and values are present in a given sequence and hence return Boolean values 'True” and “False“.
You can search an array for a certain value, and return the indexes that get a match. To search an array, use the where() method.
We can use np.core.defchararray.find
to find the position of foo
string in each element of bar
, which would return -1
if not found. Thus, it could be used to detect whether foo
is present in each element or not by checking for -1
on the output from find
. Finally, we would use np.flatnonzero
to get the indices of matches. So, we would have an implementation, like so -
np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1)
Sample run -
In [91]: bar Out[91]: array(['aaa', 'aab', 'aca'], dtype='|S3') In [92]: foo Out[92]: 'aa' In [93]: np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1) Out[93]: array([0, 1]) In [94]: bar[2] = 'jaa' In [95]: np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1) Out[95]: array([0, 1, 2])
Look at some examples of using in
:
In [19]: bar = np.array(["aaa", "aab", "aca"]) In [20]: 'aa' in bar Out[20]: False In [21]: 'aaa' in bar Out[21]: True In [22]: 'aab' in bar Out[22]: True In [23]: 'aab' in list(bar)
It looks like in
when used with an array works as though the array was a list. ndarray
does have a __contains__
method, so in
works, but it is probably simple.
But in any case, note that in alist
does not check for substrings. The strings
__contains__
does the substring test, but I don't know any builtin class that propagates the test down to the component strings.
As Divakar
shows there is a collection of numpy functions that applies string methods to individual elements of an array.
In [42]: np.char.find(bar, 'aa') Out[42]: array([ 0, 0, -1])
Docstring:
This module contains a set of functions for vectorized string operations and methods. The preferred alias fordefchararray
isnumpy.char
.
For operations like this I think the np.char
speeds are about same as with:
In [49]: np.frompyfunc(lambda x: x.find('aa'), 1, 1)(bar) Out[49]: array([0, 0, -1], dtype=object) In [50]: np.frompyfunc(lambda x: 'aa' in x, 1, 1)(bar) Out[50]: array([True, True, False], dtype=object)
Further tests suggest that the ndarray
__contains__
operates on the flat
version of the array - that is, shape doesn't affect its behavior.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With