Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy.unique sort based on counts

The numpy.unique function allows to return the counts of unique elements if return_counts is True. Now the returned tuple consists of two arrays one containing the unique elements and the 2nd one containing a count array, both are sorted by the unique elements. Now is there a way to have both sorted according to the counts array instead of the unique elements? I mean I know how to do it the hard way but is there some concise one-liner or lambda functionality for such cases?

Current result:

my_chr_list = ["a","a","a", "b", "c", "b","d", "d"]
unique_els, counts = np.unique(my_chr_list, return_counts=True)
print(unique_els, counts)

Which returns something along the lines of this:

>>> (array(['a', 'b', 'c', 'd'], 
     dtype='<U1'), array([3, 2, 1, 2], dtype=int64))

However, what I would want to have:

>>> (array(['a', 'b', 'd', 'c'], 
     dtype='<U1'), array([3, 2, 2, 1], dtype=int64))
like image 331
meow Avatar asked Feb 14 '18 10:02

meow


People also ask

Is NP unique fast?

Using np. unique is on average faster than everything else, and it certainly gives the best performance on array data structures out of all the methods.

How does Numpy unique work?

This function returns an array of unique elements in the input array. The function can be able to return a tuple of array of unique vales and an array of associated indices. Nature of the indices depend upon the type of return parameter in the function call.


1 Answers

You can't do this directly with unique function. Instead as a Numpythonic approach, you can use return_index keyword to get the indices of the unique items then use np.argsort to get the indices of the sorted count items and use the result to find the items based on their frequency.

In [33]: arr = np.array(my_chr_list)

In [34]: u, count = np.unique(my_chr_list, return_counts=True)

In [35]: count_sort_ind = np.argsort(-count)

In [36]: u[count_sort_ind]
Out[36]: 
array(['a', 'b', 'd', 'c'], 
      dtype='<U1')

In [37]: count[count_sort_ind]
Out[37]: array([3, 2, 2, 1])
like image 117
Mazdak Avatar answered Sep 21 '22 20:09

Mazdak