I am processing a CSV file and counting the unique values of column 4. So far I have coded this three ways. One uses "if key in dictionary", the second traps the KeyError and the third uses "DefaultDictionary". For example (where x[3] is the value from the file and "a" is a dictionary):
First way:
if x[3] in a:
a[x[3]] += 1
else:
a[x[3]] = 1
Second way:
try:
b[x[3]] += 1
except KeyError:
b[x[3]] = 1
Third way:
from collections import defaultdict
c = defaultdict(int)
c[x[3]] += 1
My question is: which way is more efficient... cleaner... better... etc. Or is there a better way. Both ways work and give the same answer, but I thought I would tap the hive mind as a learning case.
Thanks -
Use collections.Counter. Counter is syntactic sugar for defaultdict(int), but what's cool about it is that it accepts an iterable in the constructor, thus saving an extra step (I assume all of your examples above are wrapped in a for-loop.)
from collections import Counter
count = Counter(x[3] for x in my_csv_reader)
Prior to the introduction of collections.Counter, collections.defaultdict was the most idiomatic for this task, so for users < 2.7, use defaultdict.
from collections import defaultdict
count = defaultdict(int)
for x in my_csv_reader:
count[x[3]] += 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With