Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counter.most_common(n) how to override arbitrary ordering

Can I accomplish a rank/sort using Counter.most_common() functionality, thus avoiding this line: d = sorted(d.items(), key=lambda x: (-x[1],x[0]), reverse=False) ??

Challenge: You are given a string.The string contains only lowercase English alphabet characters.Your task is to find the top three most common characters in the string.

Output Format: Print the three most common characters along with their occurrence count each on a separate line. Sort output in descending order of occurrence count. If the occurrence count is the same, sort the characters in ascending order.

In completing this I used dict, Counter, and sort in order to ensure "the occurrence count is the same, sort the characters in ascending order". The in-built Python sorted functionality ensures ordering by count, then alphabetical. I'm curious if there is a way to override Counter.most_common() default arbitrary sort/order logic as it seems to disregard the lexicographical order of the results when picking the top 3.

import sys
from collections import Counter

string = sys.stdin.readline().strip()
d = dict(Counter(string).most_common(3))
d = sorted(d.items(), key=lambda x: (-x[1],x[0]), reverse=False)

for letter, count in d[:3]:
    print letter, count
like image 280
Michael B Avatar asked Mar 28 '17 17:03

Michael B


People also ask

Is counter in Python ordered?

Python Counter Counter is an unordered collection where elements are stored as Dict keys and their count as dict value. Counter elements count can be positive, zero or negative integers. However there is no restriction on it's keys and values.

What is the time complexity of counter in Python?

As the source code shows, Counter is just a subclass of dict. Constructing it is O(n), because it has to iterate over the input, but operations on individual elements remain O(1).

How does Most_common work Python?

The most_common() FunctionThe Counter() function returns a dictionary which is unordered. You can sort it according to the number of counts in each element using most_common() function of the Counter object. You can see that most_common function returns a list, which is sorted based on the count of the elements.


1 Answers

Yes the doc explicitly says Counter.most_common()'s (tie-breaker) order for when counts are equal is arbitrary.

  • UPDATE: PM2Ring told me Counter inherits dict's ordering. The insertion order thing only happens in 3.6+, and is only guaranteed in 3.7. It's possible the doc is lagging.
  • In cPython 3.6+ they fall back on original insertion order (see bottom), but don't rely on that implementation because per the spec, it's not defined behavior. Best to do your own sort, as you say, if you want totally deterministic behavior.
  • I show at bottom how you can monkey-patch Counter.most_common with your own sort function like you show, but that's frowned on. (Code you write might accidentally rely on it and hence break when it wasn't patched.)
  • You could subclass Counter to MyCounter so you can override its most_common. Painful and not really portable.
  • Really the best approach is just to write code and tests that don't rely on the arbitrary tiebreaker order from most_common()
  • I agree that most_common() should not have been hardwired and we should be able to pass a comparison key or sort function into __init__().

Monkey-patching Counter.most_common() :

def patched_most_common(self):
    return sorted(self.items(), key=lambda x: (-x[1],x[0]))

collections.Counter.most_common = patched_most_common

collections.Counter('ccbaab')
Counter({'a': 2, 'b': 2, 'c': 2})

Demonstrating that in cPython 3.7, the arbitrary order is order of insertion (first insertion of each character):

Counter('abccba').most_common()
[('a', 2), ('b', 2), ('c', 2)]

Counter('ccbaab').most_common()
[('c', 2), ('b', 2), ('a', 2)]
like image 126
smci Avatar answered Sep 21 '22 13:09

smci