Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing objects whose counts are less than threshold in counter.

Tags:

python

I have a counter declared as: main_dict = Counter() and values are added as main_dict[word] += 1. In the end I want to remove all the elements less than 15 in frequency. Is there any function in Counters to do this.

Any help appreciated.

like image 881
Aman Deep Gautam Avatar asked Apr 07 '13 11:04

Aman Deep Gautam


People also ask

What is collections Counter () in Python?

Counter is an unordered collection where elements are stored as Dict keys and their count as dict value. Counter elements count can be positive, zero or negative integers. However there is no restriction on it's keys and values. Although values are intended to be numbers but we can store other objects too.

What is the use of collections Counter?

The Counter holds the data in an unordered collection, just like hashtable objects. The elements here represent the keys and the count as values. It allows you to count the items in an iterable list. Arithmetic operations like addition, subtraction, intersection, and union can be easily performed on a Counter.

What is Counter 0 in Python?

If a value has not been seen in the input, its count is 0 (like for unknown item e & f in above output). The elements() method returns an iterator that produces all of the items known to the Counter.

How do you get the length of a Counter in Python?

Size of the Counter is len and this has O (1) access. Also in beginning of your try to describe what you want to know more deeply "I want to know how many items are in a Python Counter leads to the same answer: len (c).


2 Answers

>>> from collections import Counter >>> counter = Counter({'baz': 20, 'bar': 15, 'foo': 10}) >>> Counter({k: c for k, c in counter.items() if c >= 15}) Counter({'baz': 20, 'bar': 15}) 
like image 83
jamylak Avatar answered Sep 21 '22 20:09

jamylak


No, you'll need to remove them manually. Using itertools.dropwhile() makes that a little easier perhaps:

from itertools import dropwhile  for key, count in dropwhile(lambda key_count: key_count[1] >= 15, main_dict.most_common()):     del main_dict[key] 

Demonstration:

>>> main_dict Counter({'baz': 20, 'bar': 15, 'foo': 10}) >>> for key, count in dropwhile(lambda key_count: key_count[1] >= 15, main_dict.most_common()): ...     del main_dict[key] ...  >>> main_dict Counter({'baz': 20, 'bar': 15}) 

By using dropwhile you only need to test the keys for which the count is 15 or over; after that it'll forgo testing and just pass through everything. That works great with the sorted most_common() list. If there are a lot of values below 15, that saves execution time for all those tests.

like image 38
Martijn Pieters Avatar answered Sep 21 '22 20:09

Martijn Pieters