How can I search for a small set of values in a numpy array (not sorted, and shouldn't be changed)? It should return the indices of those values.
For example:
a = np.array(['d', 'v', 'h', 'r', 'm', 'a']) # in general it will be large
query = np.array(['a', 'v', 'd'])
# Required:
idnx = someNumpyFunction(a, query)
print(indx) # should be [5, 1, 0]
I'm a beginner in numpy and I couldn't find the proper way to do this task for multiple values at the same time (I know np.where(a=='d') can do it for a single value search).
Get the index of elements in the Python loopCreate a NumPy array and iterate over the array to compare the element in the array with the given array. If the element matches print the index.
Array indexing is the same as accessing an array element. You can access an array element by referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.
If we have a Numpy array with boolean, True or False data, we can use np. all() to check if all of the elements are True .
A classic way of checking one array against another is adjust the shape and use '==':
In [250]: arr==query[:,None]
Out[250]:
array([[False, False, False, False, False, True],
[False, True, False, False, False, False],
[ True, False, False, False, False, False]], dtype=bool)
In [251]: np.where(arr==query[:,None])
Out[251]: (array([0, 1, 2]), array([5, 1, 0]))
If an element query
isn't found in a
, its 'row' will be missing, e.g. [0,2]
instead of [0,1,2]
In [261]: np.where(arr==np.array(['a','x','v'],dtype='S')[:,None])
Out[261]: (array([0, 2]), array([5, 1]))
For this small example, it is considerably faster than a list comprehension equivalent:
np.hstack([(arr==i).nonzero()[0] for i in query])
It's a little slower than the searchsorted
solution. (In that solution i
is out of bounds if query
element is not found).
Stefano suggested fromiter
. It saves some time compared to hstack
of a list:
In [313]: timeit np.hstack([(arr==i).nonzero()[0] for i in query])10000 loops, best of 3: 49.5 us per loop
In [314]: timeit np.fromiter(((arr==i).nonzero()[0] for i in query), dtype=int, count=len(query))
10000 loops, best of 3: 35.3 us per loop
But if raises an error is an element is missing, or if there are multiple occurances. hstack
can handle variable length entries, fromiter
cannot.
np.flatnonzero(arr==i)
is slower than ().nonzero()[0]
, but I haven't looked into why.
You can use np.searchsorted
on the sorted array, then revert the returned indices to the original array. For that you may use np.argsort
; as in:
>>> indx = a.argsort() # indices that would sort the array
>>> i = np.searchsorted(a[indx], query) # indices in the sorted array
>>> indx[i] # indices with respect to the original array
array([5, 1, 0])
if a
is of size n
and query
is of size k
, this will be O(n log n + k log n)
which would be faster than O(n k)
for linear search if log n < k
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With