Suppose I have some data that looks like the following. <pre class="prettyprint"><code>Lucy = 1 Bob = 5 Jim = 40 Susan = 6 Lucy = 2 Bob = 30 Harold = 6 </code></pre> I want to combine: <ol> <li>remove duplicate keys, and</li> <li>add the values for these duplicate keys.</li> </ol> That means I'd get the key/values: <pre class="prettyprint"><code>Lucy = 3 Bob = 35 Jim = 40 Susan = 6 Harold = 6 </code></pre> Would it be better to use (from collections) a counter or a default dict for this?

Both <code>Counter</code> and <code>defaultdict(int)</code> can work fine here, but there are few differences between them: <ul> <li><code>Counter</code> supports most of the operations you can do on a multiset. So, if you want to use those operation then go for Counter. </li> <li><code>Counter</code> won't add new keys to the dict when you query for missing keys. So, if your queries include keys that may not be present in the dict then better use <code>Counter</code>.</li> </ul> Example: <pre class="prettyprint"><code>>>> c = Counter() >>> d = defaultdict(int) >>> c[0], d[1] (0, 0) >>> c Counter() >>> d defaultdict(<type 'int'>, {1: 0}) </code></pre> Example: <ul> <li> <code>Counter</code> also has a method called <code>most_common</code> that allows you to sort items by their count. To get the same thing in <code>defaultdict</code> you'll have to use <code>sorted</code>.</li> </ul> Example: <pre class="prettyprint"><code>>>> c = Counter('aaaaaaaaabbbbbbbcc') >>> c.most_common() [('a', 9), ('b', 7), ('c', 2)] >>> c.most_common(2) #return 2 most common items and their counts [('a', 9), ('b', 7)] </code></pre> <ul> <li> <code>Counter</code> also allows you to create a list of elements from the Counter object.</li> </ul> Example: <pre class="prettyprint"><code>>>> c = Counter({'a':5, 'b':3}) >>> list(c.elements()) ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b'] </code></pre> So, depending on what you want to do with the resulting dict you can choose between <code>Counter</code> and <code>defaultdict(int)</code>.

Python: Collections.Counter vs defaultdict(int)

Tags:

python

dictionary

Suppose I have some data that looks like the following.

Lucy = 1 Bob = 5 Jim = 40 Susan = 6 Lucy = 2 Bob = 30 Harold = 6

I want to combine:

remove duplicate keys, and
add the values for these duplicate keys.

That means I'd get the key/values:

Lucy = 3 Bob = 35 Jim = 40 Susan = 6 Harold = 6

Would it be better to use (from collections) a counter or a default dict for this?

654

asked Nov 09 '13 20:11

covariance

2 Answers

Both Counter and defaultdict(int) can work fine here, but there are few differences between them:

Counter supports most of the operations you can do on a multiset. So, if you want to use those operation then go for Counter.
Counter won't add new keys to the dict when you query for missing keys. So, if your queries include keys that may not be present in the dict then better use Counter.

Example:

>>> c = Counter() >>> d = defaultdict(int) >>> c[0], d[1] (0, 0) >>> c Counter() >>> d defaultdict(<type 'int'>, {1: 0})

Example:

Counter also has a method called most_common that allows you to sort items by their count. To get the same thing in defaultdict you'll have to use sorted.

Example:

>>> c = Counter('aaaaaaaaabbbbbbbcc') >>> c.most_common() [('a', 9), ('b', 7), ('c', 2)] >>> c.most_common(2)          #return 2 most common items and their counts [('a', 9), ('b', 7)]

Counter also allows you to create a list of elements from the Counter object.

Example:

>>> c = Counter({'a':5, 'b':3}) >>> list(c.elements()) ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b']

So, depending on what you want to do with the resulting dict you can choose between Counter and defaultdict(int).

161

answered Oct 12 '22 10:10

Ashwini Chaudhary

defaultdict(int) seems to work more faster.

In [1]: from collections import Counter, defaultdict  In [2]: def test_counter():    ...:     c = Counter()    ...:     for i in range(10000):    ...:         c[i] += 1    ...:  In [3]: def test_defaultdict():    ...:     d = defaultdict(int)    ...:     for i in range(10000):    ...:         d[i] += 1    ...:  In [4]: %timeit test_counter() 5.28 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)  In [5]: %timeit test_defaultdict() 2.31 ms ± 68.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

answered Oct 12 '22 11:10

ImPerat0R_

Related questions
                            
                                How can I visualize the weights(variables) in cnn in Tensorflow?
                            
                                transform scipy sparse csr to pandas?
                            
                                Replace textarea with rich text editor in Django Admin?
                            
                                How can I host my own private conda repository?
                            
                                TypeError: Invalid dimensions for image data when plotting array with imshow()
                            
                                How to use asyncio with existing blocking library?
                            
                                Scraping dynamic content using python-Scrapy
                            
                                Iterating over dictionary items(), values(), keys() in Python 3
                            
                                How to check if a pymongo cursor has query results
                            
                                Organising my Python project
                            
                                Handle either a list or single integer as an argument
                            
                                How can I create a Word document using Python? [closed]
                            
                                f.write vs print >> f
                            
                                Postgres SSL SYSCALL error: EOF detected with python and psycopg
                            
                                Matplotlib showing x-tick labels overlapping
                            
                                JSON.stringify (Javascript) and json.dumps (Python) not equivalent on a list?
                            
                                How to create conda environment with specific python version?
                            
                                Matplotlib overlapping annotations
                            
                                a good python to exe compiler? [closed]
                            
                                Can I have a Django form without Model

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: Collections.Counter vs defaultdict(int)

Tags:

python

dictionary

covariance

People also ask

2 Answers

Ashwini Chaudhary

ImPerat0R_

Recent Activity

Donate For Us