I imagine this is an easy one for a decent Python dev - Im still learning! Given a csv with duplicate emails I would like to iterate and write out the count of duplicate emails eg:
infile.csv
COLUMN 0
[email protected]
[email protected]
[email protected]
[email protected]
outfile.csv
COLUMN 0 COLUMN 1
[email protected] 2
[email protected] 1
[email protected] 1
So far I can remove duplicates with
import csv
f = csv.reader(open('infile.csv','rb'))
writer = csv.writer(open('outfile.csv','wb'))
emails = set()
for row in f:
if row[0] not in emails:
writer.writerow(row)
emails.add( row[0] )
but I am having trouble writing the count to a new column.
Using defaultdict which is in Python2.6
from collections import defaultdict
# count all the emails before we write anything out
emails = defaultdict(int)
for row in f:
emails[row[0]] += 1
# now write the file
for row in email.items():
writer.writerow(row)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With