I am processing a CSV file and counting the unique values of column 4. So far I have coded this three ways. One uses "if key in dictionary", the second traps the KeyError and the third uses "DefaultDictionary". For example (where x[3] is the value from the file and "a" is a dictionary):
First way:
if x[3] in a:
a[x[3]] += 1
else:
a[x[3]] = 1
Second way:
try:
b[x[3]] += 1
except KeyError:
b[x[3]] = 1
Third way:
from collections import defaultdict
c = defaultdict(int)
c[x[3]] += 1
My question is: which way is more efficient... cleaner... better... etc. Or is there a better way. Both ways work and give the same answer, but I thought I would tap the hive mind as a learning case.
Thanks -
Use collections.Counter
. Counter
is syntactic sugar for defaultdict(int)
, but what's cool about it is that it accepts an iterable in the constructor, thus saving an extra step (I assume all of your examples above are wrapped in a for-loop.)
from collections import Counter
count = Counter(x[3] for x in my_csv_reader)
Prior to the introduction of collections.Counter
, collections.defaultdict
was the most idiomatic for this task, so for users < 2.7, use defaultdict
.
from collections import defaultdict
count = defaultdict(int)
for x in my_csv_reader:
count[x[3]] += 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With