Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy search array for multiple values, and returns their indices

How can I search for a small set of values in a numpy array (not sorted, and shouldn't be changed)? It should return the indices of those values.

For example:

a = np.array(['d', 'v', 'h', 'r', 'm', 'a'])   # in general it will be large
query = np.array(['a', 'v', 'd'])

# Required:
idnx = someNumpyFunction(a, query)

print(indx)       # should be [5, 1, 0]

I'm a beginner in numpy and I couldn't find the proper way to do this task for multiple values at the same time (I know np.where(a=='d') can do it for a single value search).

like image 291
Doaa Avatar asked Sep 27 '22 17:09

Doaa


People also ask

How do you return an index from an array in python?

Get the index of elements in the Python loopCreate a NumPy array and iterate over the array to compare the element in the array with the given array. If the element matches print the index.

Can you index a NumPy array?

Array indexing is the same as accessing an array element. You can access an array element by referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.

How do I search all elements in a NumPy array?

If we have a Numpy array with boolean, True or False data, we can use np. all() to check if all of the elements are True .


2 Answers

A classic way of checking one array against another is adjust the shape and use '==':

In [250]: arr==query[:,None]
Out[250]: 
array([[False, False, False, False, False,  True],
       [False,  True, False, False, False, False],
       [ True, False, False, False, False, False]], dtype=bool)

In [251]: np.where(arr==query[:,None])
Out[251]: (array([0, 1, 2]), array([5, 1, 0]))

If an element query isn't found in a, its 'row' will be missing, e.g. [0,2] instead of [0,1,2]

In [261]: np.where(arr==np.array(['a','x','v'],dtype='S')[:,None])
Out[261]: (array([0, 2]), array([5, 1]))   

For this small example, it is considerably faster than a list comprehension equivalent:

np.hstack([(arr==i).nonzero()[0] for i in query])

It's a little slower than the searchsorted solution. (In that solution i is out of bounds if query element is not found).


Stefano suggested fromiter. It saves some time compared to hstack of a list:

In [313]: timeit np.hstack([(arr==i).nonzero()[0] for i in query])10000 loops, best of 3: 49.5 us per loop

In [314]: timeit np.fromiter(((arr==i).nonzero()[0] for i in query), dtype=int, count=len(query))
10000 loops, best of 3: 35.3 us per loop

But if raises an error is an element is missing, or if there are multiple occurances. hstack can handle variable length entries, fromiter cannot.

np.flatnonzero(arr==i) is slower than ().nonzero()[0], but I haven't looked into why.

like image 129
hpaulj Avatar answered Oct 16 '22 23:10

hpaulj


You can use np.searchsorted on the sorted array, then revert the returned indices to the original array. For that you may use np.argsort; as in:

>>> indx = a.argsort()  # indices that would sort the array
>>> i = np.searchsorted(a[indx], query)  # indices in the sorted array
>>> indx[i]  # indices with respect to the original array
array([5, 1, 0])

if a is of size n and query is of size k, this will be O(n log n + k log n) which would be faster than O(n k) for linear search if log n < k.

like image 25
behzad.nouri Avatar answered Oct 16 '22 21:10

behzad.nouri