Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Collections.Counter vs defaultdict(int)

Suppose I have some data that looks like the following.

Lucy = 1 Bob = 5 Jim = 40 Susan = 6 Lucy = 2 Bob = 30 Harold = 6 

I want to combine:

  1. remove duplicate keys, and
  2. add the values for these duplicate keys.

That means I'd get the key/values:

Lucy = 3 Bob = 35 Jim = 40 Susan = 6 Harold = 6 

Would it be better to use (from collections) a counter or a default dict for this?

like image 654
covariance Avatar asked Nov 09 '13 20:11

covariance


People also ask

What is collections Defaultdict int?

defaultdict means that if a key is not found in the dictionary, then instead of a KeyError being thrown, a new entry is created.

Is counter faster than dict?

this last version is faster than the defaultdict(int) meaning that unless you care more about readability you should use the dict() rather than the defaultdict().

What is Defaultdict in collections Python?

Defaultdict is a container like dictionaries present in the module collections. Defaultdict is a sub-class of the dictionary class that returns a dictionary-like object. The functionality of both dictionaries and defaultdict are almost same except for the fact that defaultdict never raises a KeyError.

When would you use a Defaultdict?

The Python defaultdict type behaves almost exactly like a regular Python dictionary, but if you try to access or modify a missing key, then defaultdict will automatically create the key and generate a default value for it. This makes defaultdict a valuable option for handling missing keys in dictionaries.


2 Answers

Both Counter and defaultdict(int) can work fine here, but there are few differences between them:

  • Counter supports most of the operations you can do on a multiset. So, if you want to use those operation then go for Counter.

  • Counter won't add new keys to the dict when you query for missing keys. So, if your queries include keys that may not be present in the dict then better use Counter.

Example:

>>> c = Counter() >>> d = defaultdict(int) >>> c[0], d[1] (0, 0) >>> c Counter() >>> d defaultdict(<type 'int'>, {1: 0}) 

Example:

  • Counter also has a method called most_common that allows you to sort items by their count. To get the same thing in defaultdict you'll have to use sorted.

Example:

>>> c = Counter('aaaaaaaaabbbbbbbcc') >>> c.most_common() [('a', 9), ('b', 7), ('c', 2)] >>> c.most_common(2)          #return 2 most common items and their counts [('a', 9), ('b', 7)] 
  • Counter also allows you to create a list of elements from the Counter object.

Example:

>>> c = Counter({'a':5, 'b':3}) >>> list(c.elements()) ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b'] 

So, depending on what you want to do with the resulting dict you can choose between Counter and defaultdict(int).

like image 161
Ashwini Chaudhary Avatar answered Oct 12 '22 10:10

Ashwini Chaudhary


defaultdict(int) seems to work more faster.

In [1]: from collections import Counter, defaultdict  In [2]: def test_counter():    ...:     c = Counter()    ...:     for i in range(10000):    ...:         c[i] += 1    ...:  In [3]: def test_defaultdict():    ...:     d = defaultdict(int)    ...:     for i in range(10000):    ...:         d[i] += 1    ...:  In [4]: %timeit test_counter() 5.28 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)  In [5]: %timeit test_defaultdict() 2.31 ms ± 68.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 
like image 23
ImPerat0R_ Avatar answered Oct 12 '22 11:10

ImPerat0R_