Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting Counter collection in python with secondary term (tie breaker)

I have a Counter in Python 3.3.x which I want to sort.
I know I can use .most_common(x) but I want the keys to be sorted alphabetically in case of same value.
Is there a way I can do this ? Setting this kind of a 'tie breaker' ?

like image 786
SagiLow Avatar asked Apr 12 '14 17:04

SagiLow


People also ask

How does Python sort break ties?

The answer is that the python interpreter will sort the tied items in the same order they were in before the sorting.

How do you break ties in sorting?

The answer is to break the tie by using the made up data. Its completely arbitrary but now no mater how often the list is scrambled and resorted, it will always sort the elements into the same positions allowing comparison by comparing each element pair.

What does counter () do in Python?

Counter is a subclass of dict that's specially designed for counting hashable objects in Python. It's a dictionary that stores objects as keys and counts as values. To count with Counter , you typically provide a sequence or iterable of hashable objects as an argument to the class's constructor.


2 Answers

collections.Counter is actually a dictionary and they rely on hashing technique, so we really cannot access them by order. Since accessing by order is not possible, sorting a dictionary is out of question. But you can convert that to a list of tuples which correspond to key and value, and then sort that. For example,

print(Counter('abracadabra').most_common())
# [('a', 5), ('r', 2), ('b', 2), ('c', 1), ('d', 1)]
print(sorted(Counter('abracadabra').most_common(), key=lambda x: (-x[1], x[0])))
# [('a', 5), ('b', 2), ('r', 2), ('c', 1), ('d', 1)]

We make the sorted sort the (key, value) data given by most_common. We want to make sure that the items have to be sorted descending by their value and in ascending by the key. So, we use a little trick here. sorted will call the function we pass as the value for key parameter, for each of the elements in the sequence to be sorted. The value returned by that function will be used to represent that particular element during comparison with other elements. In our case, the key function goes like this

lambda x: (-x[1], x[0])

Here, x will get all elements eventually and it swaps the position of first and second elements and negates the actual count part. Since, the sorted, by default, sorts the data in ascending order, we make the biggest number the smallest and vice-versa. For example,

[2, 3, 1]

If you want to sort them in ascending order, the sorted will keep the smallest element at the beginning and the next smallest in the second position and so on, till it reaches the largest element. In our case, it becomes [1, 2, 3]. To sort the elements in the descending order, we make their negated values represent the actual numbers.

sorted([2, 3, 1], key=lambda x: -x)

Now, when sorted picks 2, it calls the key function to get the value to be used and it will return -2 and the same way, 1 will be -1, 3 will be -3. It will be placing the element with the smallest at the beginning. Since we got -3 for 3, 3 will be at the beginning, 2 will be next to it and 1 will be after it. So the result becomes [3, 2, 1].

We apply the same technique, to sort based on two items in an element. We first sort based on the count values by descending and if they match sort based on the key, ascending.

like image 137
thefourtheye Avatar answered Oct 13 '22 11:10

thefourtheye


The problem of sorting by several options (and different ordering) in case of tiebreaks can be solved using sorted() and the lambda function applied to the 'keys' parameter.

result=sorted(result,key=lambda x: (-x[2],x[0],x[1]))

The '-'ve sign for x[2] implies that sorting should be done first by the descending order of the 3rd element of 'result'. x[0], x[1] further indicates that ties should be broken by the ascending order of x[0] and x[1] in that exact order. Also refer 1

like image 28
narcissus789 Avatar answered Oct 13 '22 12:10

narcissus789