How to remove duplicates in a csv file based on two columns?

I have a csv file like this :

column1    column2

john       kerry
adam       stephenson
ashley     hudson
john       kerry
etc..

I want to remove duplicates from this file, to get only :

column1    column2

john       kerry
adam       stephenson
ashley     hudson

I wrote this script that removes duplicates based on lastnames, but I need to remove duplicates based on lastnames AND firstname.

import csv

reader=csv.reader(open('myfilewithduplicates.csv', 'r'), delimiter=',')
writer=csv.writer(open('myfilewithoutduplicates.csv', 'w'), delimiter=',')

lastnames = set()
for row in reader:
    if row[1] not in lastnames:
        writer.writerow(row)
        lastnames.add( row[1] )

Can you remove duplicates based on two columns?

Often you may want to remove duplicate rows based on two columns in Excel. Fortunately this is easy to do using the Remove Duplicates function within the Data tab.

How do you compare and remove duplicates in two columns?

Navigate to the "Home" option and select duplicate values in the toolbar. Next, navigate to Conditional Formatting in Excel Option. A new window will appear on the screen with options to select "Duplicate" and "Unique" values. You can compare the two columns with matching values or unique values.

You can now use the .drop_duplicates method in pandas. I would do the following:

import pandas as pd
toclean = pd.read_csv('myfilewithduplicates.csv')
deduped = toclean.drop_duplicates([col1,col2])
deduped.to_csv('myfilewithoutduplicates.csv')

How to remove duplicates in a csv file based on two columns?

Tags:

python

Reveclair

People also ask

1 Answers

Bradley

Recent Activity

Donate For Us

How to remove duplicates in a csv file based on two columns?

Tags:

python

Reveclair

People also ask

1 Answers

Bradley

Related questions

Recent Activity

Donate For Us