Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can numpy argsort handle ties? [duplicate]

I have a numpy array:

foo = array([3, 1, 4, 0, 1, 0])

I want the top 3 items. Calling

foo.argsort()[::-1][:3]

returns

array([2, 0, 4])

Notice values foo[1] and foo[4] are equal, so numpy.argsort() handles the tie by returning the index of the item which appears last in the array; i.e. index 4.

For my application I can't have the tie breaking always bias the end of the array, so how can I implement a random tie break? That is, half the time I would get array([2, 0, 4]), and the other half I would get array([2, 0, 1]).

like image 324
BoltzmannBrain Avatar asked Jul 11 '15 01:07

BoltzmannBrain


1 Answers

Here's one approach:

Use numpy.unique to both sort the array and remove duplicate items. Pass the return_inverse argument to get the indices into the sorted array that give the values of the original array. Then, you can get all of the indices of the tied items by finding the indices of the inverse array whose values are equal to the index into the unique array for that item.

For example:

foo = array([3, 1, 4, 0, 1, 0])
foo_unique, foo_inverse = unique(foo, return_inverse=True)

# Put largest items first
foo_unique = foo_unique[::-1]
foo_inverse = -foo_inverse + len(foo_unique) - 1

foo_top3 = foo_unique[:3]

# Get the indices into foo of the top item
first_indices = (foo_inverse == 0).nonzero()

# Choose one at random
first_random_idx = random.choice(first_indices)

second_indices = (foo_inverse == 1).nonzero()
second_random_idx = random.choice(second_indices)

# And so on...

numpy.unique is implemented using argsort, so a glance at its implementation might suggest a simpler approach.

like image 113
codewarrior Avatar answered Nov 15 '22 07:11

codewarrior