I have a large (millions) array of ID numbers ids
, and I want to find the indices where another array of targets (targets
) exist in the ids
array. For example, if
ids = [22, 5, 4, 0, 100]
targets = [5, 0]
then I want the result:
>>> [1,3]
If I pre-sort the array of ids
, then it's easy to find matches using numpy.searchsorted
, e.g.
>>> ids = np.array([22, 5, 4, 0, 100])
>>> targets = [5, 0]
>>> sort = np.argsort(ids)
>>> ids[sort]
[0,4,5,22,100]
>>> np.searchsorted(ids, targets, sorter=sort)
[2,0]
But how can I find the reverse mapping to 'unsort' this result? I.e. to map the sorted entries at [2,0]
back to where they were before: [1,3]
.
We can get the indices of the sorted elements of a given array with the help of argsort() method. This function is used to perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as arr that that would sort the array.
searchsorted() function is used to find the indices into a sorted array arr such that, if elements are inserted before the indices, the order of arr would be still preserved. Here, binary search is used to find the required insertion indices.
The NumPy ndarray object has a function called sort() , that will sort a specified array.
There are a few answers dancing around this already, but just to make it clear all you need to do is use sort[rank]
.
# Setup
ids = np.array([22, 5, 4, 0, 100])
targets = np.array([5, 0])
sort = np.argsort(ids)
rank = np.searchsorted(ids, targets, sorter=sort)
print(sort[rank])
# array([1, 3])
Could you just do this?
sort[np.searchsorted(ids, targets, sorter=sort)]
Alternatively:
np.hstack([np.where(ids==x)[0] for x in targets])
both give:
array([1, 3])
I think I've come up with something.
We can construct a 'cipher' or sorts: key = numpy.arange(len(ids))
applying the initial sorter to this key then gives the reverse mapping: revsort = key[np.argsort(ids)]
edit: as @birico points out, key[sort]
is identical to sort
itself!
>>> sort = np.argsort(ids)
>>> ids[sort]
[0,4,5,22,100]
>>> found = np.searchsorted(ids, targets, sorter=sort)
>>> found
[2,0]
>>> sort[found]
[1,3]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With