Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy argsort - what is it doing?

Tags:

python

numpy

Why is numpy giving this result:

x = numpy.array([1.48,1.41,0.0,0.1])
print x.argsort()

>[2 3 1 0]

when I'd expect it to do this:

[3 2 0 1]

Clearly my understanding of the function is lacking.

like image 919
user1276273 Avatar asked Oct 07 '22 12:10

user1276273


People also ask

What is the difference between Argsort and sort?

sort() returns the sorted array whereas np. argsort() returns an array of the corresponding indices.

Is NP Argsort stable?

NumPy's np. argsort is able to do stable sorting through passing kind = 'stable' argument.

What is Lexsort?

The lexsort() function in Python is used to perform an indirect stable sort using a sequence of keys. Given multiple sorting keys, which is interpreted as columns in a spreadsheet, the lexsort() function returns an array of integer indices which describes the sort order by multiple columns.

How do you use argsort in NumPy?

numpy.argsort () function is used to perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as arr that that would sort the array. Syntax : numpy.argsort (arr, axis=-1, kind=’quicksort’, order=None)

How to sort an array using argsort () function in Python?

numpy.argsort () function is used to perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as arr that that would sort the array. Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

How to sort a NumPy array by Axis in Python?

If you are using numpy.argsort () method on 2-D Numpy array then you have to pass the axis argument also. The value of it will be 0 for sorting along down way and for across set it as 1. The default value of the axis is 0. The above code will print out the indices for the sorted array.

What is the axis of argsort in Python?

So if you use axis = 0, argsort will operate downward (for a 2D or higher-dimensional array). If you use axis = 1, argsort will operate horizontally (again, for a 2D or higher-dimensional array). The axis parameter is optional.


2 Answers

According to the documentation

Returns the indices that would sort an array.

  • 2 is the index of 0.0.
  • 3 is the index of 0.1.
  • 1 is the index of 1.41.
  • 0 is the index of 1.48.
like image 167
falsetru Avatar answered Oct 23 '22 10:10

falsetru


[2, 3, 1, 0] indicates that the smallest element is at index 2, the next smallest at index 3, then index 1, then index 0.

There are a number of ways to get the result you are looking for:

import numpy as np
import scipy.stats as stats

def using_indexed_assignment(x):
    "https://stackoverflow.com/a/5284703/190597 (Sven Marnach)"
    result = np.empty(len(x), dtype=int)
    temp = x.argsort()
    result[temp] = np.arange(len(x))
    return result

def using_rankdata(x):
    return stats.rankdata(x)-1

def using_argsort_twice(x):
    "https://stackoverflow.com/a/6266510/190597 (k.rooijers)"
    return np.argsort(np.argsort(x))

def using_digitize(x):
    unique_vals, index = np.unique(x, return_inverse=True)
    return np.digitize(x, bins=unique_vals) - 1

For example,

In [72]: x = np.array([1.48,1.41,0.0,0.1])

In [73]: using_indexed_assignment(x)
Out[73]: array([3, 2, 0, 1])

This checks that they all produce the same result:

x = np.random.random(10**5)
expected = using_indexed_assignment(x)
for func in (using_argsort_twice, using_digitize, using_rankdata):
    assert np.allclose(expected, func(x))

These IPython %timeit benchmarks suggests for large arrays using_indexed_assignment is the fastest:

In [50]: x = np.random.random(10**5)
In [66]: %timeit using_indexed_assignment(x)
100 loops, best of 3: 9.32 ms per loop

In [70]: %timeit using_rankdata(x)
100 loops, best of 3: 10.6 ms per loop

In [56]: %timeit using_argsort_twice(x)
100 loops, best of 3: 16.2 ms per loop

In [59]: %timeit using_digitize(x)
10 loops, best of 3: 27 ms per loop

For small arrays, using_argsort_twice may be faster:

In [78]: x = np.random.random(10**2)

In [81]: %timeit using_argsort_twice(x)
100000 loops, best of 3: 3.45 µs per loop

In [79]: %timeit using_indexed_assignment(x)
100000 loops, best of 3: 4.78 µs per loop

In [80]: %timeit using_rankdata(x)
100000 loops, best of 3: 19 µs per loop

In [82]: %timeit using_digitize(x)
10000 loops, best of 3: 26.2 µs per loop

Note also that stats.rankdata gives you more control over how to handle elements of equal value.

like image 50
unutbu Avatar answered Oct 23 '22 10:10

unutbu