Returns the indices that would sort an array. Perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as a that index data along the given axis in sorted order.
sort() returns the sorted array whereas np. argsort() returns an array of the corresponding indices.
sort() method sorts the elements of a list in ascending or descending order using the default < comparisons operator between items. Use the key parameter to pass the function name to be used for comparison instead of the default < operator. Set the reverse parameter to True, to get the list in descending order.
If you negate an array, the lowest elements become the highest elements and vice-versa. Therefore, the indices of the n
highest elements are:
(-avgDists).argsort()[:n]
Another way to reason about this, as mentioned in the comments, is to observe that the big elements are coming last in the argsort. So, you can read from the tail of the argsort to find the n
highest elements:
avgDists.argsort()[::-1][:n]
Both methods are O(n log n) in time complexity, because the argsort
call is the dominant term here. But the second approach has a nice advantage: it replaces an O(n) negation of the array with an O(1) slice. If you're working with small arrays inside loops then you may get some performance gains from avoiding that negation, and if you're working with huge arrays then you can save on memory usage because the negation creates a copy of the entire array.
Note that these methods do not always give equivalent results: if a stable sort implementation is requested to argsort
, e.g. by passing the keyword argument kind='mergesort'
, then the first strategy will preserve the sorting stability, but the second strategy will break stability (i.e. the positions of equal items will get reversed).
Example timings:
Using a small array of 100 floats and a length 30 tail, the view method was about 15% faster
>>> avgDists = np.random.rand(100)
>>> n = 30
>>> timeit (-avgDists).argsort()[:n]
1.93 µs ± 6.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
1.64 µs ± 3.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
1.64 µs ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
For larger arrays, the argsort is dominant and there is no significant timing difference
>>> avgDists = np.random.rand(1000)
>>> n = 300
>>> timeit (-avgDists).argsort()[:n]
21.9 µs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
21.7 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
21.9 µs ± 37.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Please note that the comment from nedim below is incorrect. Whether to truncate before or after reversing makes no difference in efficiency, since both of these operations are only striding a view of the array differently and not actually copying data.
Just like Python, in that [::-1]
reverses the array returned by argsort()
and [:n]
gives that last n elements:
>>> avgDists=np.array([1, 8, 6, 9, 4])
>>> n=3
>>> ids = avgDists.argsort()[::-1][:n]
>>> ids
array([3, 1, 2])
The advantage of this method is that ids
is a view of avgDists:
>>> ids.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
(The 'OWNDATA' being False indicates this is a view, not a copy)
Another way to do this is something like:
(-avgDists).argsort()[:n]
The problem is that the way this works is to create negative of each element in the array:
>>> (-avgDists)
array([-1, -8, -6, -9, -4])
ANd creates a copy to do so:
>>> (-avgDists_n).flags['OWNDATA']
True
So if you time each, with this very small data set:
>>> import timeit
>>> timeit.timeit('(-avgDists).argsort()[:3]', setup="from __main__ import avgDists")
4.2879798610229045
>>> timeit.timeit('avgDists.argsort()[::-1][:3]', setup="from __main__ import avgDists")
2.8372560259886086
The view method is substantially faster (and uses 1/2 the memory...)
Instead of using np.argsort
you could use np.argpartition
- if you only need the indices of the lowest/highest n elements.
That doesn't require to sort the whole array but just the part that you need but note that the "order inside your partition" is undefined, so while it gives the correct indices they might not be correctly ordered:
>>> avgDists = [1, 8, 6, 9, 4]
>>> np.array(avgDists).argpartition(2)[:2] # indices of lowest 2 items
array([0, 4], dtype=int64)
>>> np.array(avgDists).argpartition(-2)[-2:] # indices of highest 2 items
array([1, 3], dtype=int64)
You can use the flip commands numpy.flipud()
or numpy.fliplr()
to get the indexes in descending order after sorting using the argsort
command. Thats what I usually do.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With