Python: Removing duplicate CSV entries

Tags:

csv

I have a CSV file with multiple entries. Example csv:

user, phone, email
joe, 123, [email protected]
mary, 456, [email protected]
ed, 123, [email protected]

I'm trying to remove the duplicates by a specific column in the CSV however with the code below I'm getting an "list index out of range". I thought by comparing row[1] with newrows[1] I would find all duplicates and only rewrite the unique entries in file2.csv. This doesn't work though and I can't understand why.

f1 = csv.reader(open('file1.csv', 'rb'))
    newrows = []
    for row in f1:
        if row[1] not in newrows[1]:
            newrows.append(row)
    writer = csv.writer(open("file2.csv", "wb"))
    writer.writerows(newrows)

My end result is to have a list that maintains the sequence of the file (set won't work...right?) which should look like this:

user, phone, email
joe, 123, [email protected]
mary, 456, [email protected]

364

asked Oct 07 '11 03:10

serk

1 Answers

row[1] refers to the second column in the current row (phone). That's all well in good.

However, you newrows.append(row) add the entire row to the list.

When you check row[1] in newrows you are checking the individual phone number against a list of complete rows. But that's not what you want to do. You need to check against a list or set of just phone numbers. For that, you probably want to keep track of the rows and a set of the observed phone numbers.

Something like:

f1 = csv.reader(open('file1.csv', 'rb'))
writer = csv.writer(open("file2.csv", "wb"))
phone_numbers = set()
for row in f1:
    if row[1] not in phone_numbers:
        writer.writerow(row)
        phone_numbers.add( row[1] )

125

answered Oct 13 '22 16:10

Winston Ewert

Related questions
                            
                                Python Emailing - Use of colon causes no output
                            
                                Using GET and POST with Authorization HTTP header in Python
                            
                                Python __future__ outside of a specific module
                            
                                Django - Passing parameters to inline formset
                            
                                Getting a JSON request in a view (using Django)
                            
                                Does Scikit-learn release the python GIL?
                            
                                Python & GTK3: How to create a Liststore
                            
                                How to use split with utf8 coding?
                            
                                Can someone please recommend me a good PyQt/PySide tutorial/book/video series? [closed]
                            
                                Spawning a separate thread of execution (i.e. sending log email to dev) in Flask Python?
                            
                                python subprocess with gzip
                            
                                Submodule importing primary module
                            
                                How do I make a query where it filters everything that starts with a number in Django?
                            
                                Remove contents of <style>...</style> tags using html5lib or bleach
                            
                                Divide set into subsets with equal number of elements
                            
                                Efficient way of XML parsing in ElementTree(1.3.0) Python
                            
                                Make SQLAlchemy COMMIT instead of ROLLBACK after a SELECT query
                            
                                How to quit a pygtk application after last window is closed/destroyed
                            
                                how to close a blocking socket while it is waiting to receive data?
                            
                                Proftpd verify complete upload

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: Removing duplicate CSV entries

Tags:

python

csv

serk

People also ask

1 Answers

Winston Ewert

Recent Activity

Donate For Us