Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find unsorted indices after using numpy.searchsorted

I have a large (millions) array of ID numbers ids, and I want to find the indices where another array of targets (targets) exist in the ids array. For example, if

ids = [22, 5, 4, 0, 100]
targets = [5, 0]

then I want the result:

>>> [1,3]

If I pre-sort the array of ids, then it's easy to find matches using numpy.searchsorted, e.g.

>>> ids = np.array([22, 5, 4, 0, 100])
>>> targets = [5, 0]
>>> sort = np.argsort(ids)
>>> ids[sort]
[0,4,5,22,100]
>>> np.searchsorted(ids, targets, sorter=sort)
[2,0]

But how can I find the reverse mapping to 'unsort' this result? I.e. to map the sorted entries at [2,0] back to where they were before: [1,3].

like image 868
DilithiumMatrix Avatar asked Apr 14 '15 19:04

DilithiumMatrix


People also ask

How do I get the indices of a sorted NumPy array?

We can get the indices of the sorted elements of a given array with the help of argsort() method. This function is used to perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as arr that that would sort the array.

What does np searchsorted do?

searchsorted() function is used to find the indices into a sorted array arr such that, if elements are inserted before the indices, the order of arr would be still preserved. Here, binary search is used to find the required insertion indices.

Can you sort NumPy array?

The NumPy ndarray object has a function called sort() , that will sort a specified array.


3 Answers

There are a few answers dancing around this already, but just to make it clear all you need to do is use sort[rank].

# Setup
ids = np.array([22, 5, 4, 0, 100])
targets = np.array([5, 0])

sort = np.argsort(ids)
rank = np.searchsorted(ids, targets, sorter=sort)
print(sort[rank])
# array([1, 3])
like image 50
Bi Rico Avatar answered Oct 17 '22 14:10

Bi Rico


Could you just do this?

sort[np.searchsorted(ids, targets, sorter=sort)]

Alternatively:

np.hstack([np.where(ids==x)[0] for x in targets])

both give:

array([1, 3])
like image 34
atomh33ls Avatar answered Oct 17 '22 15:10

atomh33ls


I think I've come up with something.

We can construct a 'cipher' or sorts: key = numpy.arange(len(ids)) applying the initial sorter to this key then gives the reverse mapping: revsort = key[np.argsort(ids)]


edit: as @birico points out, key[sort] is identical to sort itself!

>>> sort = np.argsort(ids)
>>> ids[sort]
[0,4,5,22,100]
>>> found = np.searchsorted(ids, targets, sorter=sort)
>>> found
[2,0]
>>> sort[found]
[1,3]
like image 1
DilithiumMatrix Avatar answered Oct 17 '22 14:10

DilithiumMatrix