Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

collections.Counter: most_common INCLUDING equal counts

In collections.Counter, the method most_common(n) returns only the n most frequent items in a list. I need exactly that but I need to include the equal counts as well.

from collections import Counter
test = Counter(["A","A","A","B","B","C","C","D","D","E","F","G","H"])
-->Counter({'A': 3, 'C': 2, 'B': 2, 'D': 2, 'E': 1, 'G': 1, 'F': 1, 'H': 1})
test.most_common(2)
-->[('A', 3), ('C', 2)

I would need [('A', 3), ('B', 2), ('C', 2), ('D', 2)] since they have the same count as n=2 for this case. My real data is on DNA code and could be quite large. I need it to be somewhat efficient.

like image 958
KarelCote Avatar asked Nov 09 '14 17:11

KarelCote


People also ask

What is Counter from collections?

class collections. Counter ([iterable-or-mapping]) A Counter is a dict subclass for counting hashable objects. It is a collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts.

What does Counter () do in Python?

Counter is a subclass of dict that's specially designed for counting hashable objects in Python. It's a dictionary that stores objects as keys and counts as values. To count with Counter , you typically provide a sequence or iterable of hashable objects as an argument to the class's constructor.

What is Most_common in Python?

Output : Counter({'c': 3, 'b': 2, 'a': 1}) ['a', 'b', 'b', 'c', 'c', 'c'] most_common() : most_common() is used to produce a sequence of the n most frequently encountered input values and their respective counts.

What does Collections Counter return in Python?

This method returns the list of elements in the counter. Only elements with positive counts are returned.


1 Answers

You can do something like this:

from itertools import takewhile

def get_items_upto_count(dct, n):
  data = dct.most_common()
  val = data[n-1][1] #get the value of n-1th item
  #Now collect all items whose value is greater than or equal to `val`.
  return list(takewhile(lambda x: x[1] >= val, data))

test = Counter(["A","A","A","B","B","C","C","D","D","E","F","G","H"])

print get_items_upto_count(test, 2)
#[('A', 3), ('C', 2), ('B', 2), ('D', 2)]
like image 167
Ashwini Chaudhary Avatar answered Nov 08 '22 22:11

Ashwini Chaudhary