How to get dropped rows when using drop_duplicates (Pandas DataFrame)?

Question

I use pandas.DataFrame.drop_duplicates() to drop duplicates of rows where all column values are identical, however for data quality analysis, I need to produce a DataFrame with the dropped duplicate rows. How can I identify which are the rows to be dropped? It occurs to me to compare the original DF versus the new one without duplicates and identify the unique indexes missing, but is there a better way to do this?

Example:

import pandas as pd

data =[[1,'A'],[2,'B'],[3,'C'],[1,'A'],[1,'A']]

df = pd.DataFrame(data,columns=['Numbers','Letters'])

df.drop_duplicates(keep='first',inplace=True) # This will drop rows 3 and 4

# Now how to create a dataframe with the duplicate records dropped only?

Chris · Accepted Answer

import pandas as pd

data =[[1,'A'],[2,'B'],[3,'C'],[1,'A'],[1,'A']]

df = pd.DataFrame(data,columns=['Numbers','Letters'])


df.drop_duplicates()

Output

    Numbers Letters
0   1       A
1   2       B
2   3       C

and

df.loc[df.duplicated()]

Output

    Numbers Letters
3   1       A
4   1       A

How to get dropped rows when using drop_duplicates (Pandas DataFrame)?

Tags:

python

pandas

duplicates

drop-duplicates

Code Ninja 2C4U

1 Answers

Chris

Recent Activity

Donate For Us

How to get dropped rows when using drop_duplicates (Pandas DataFrame)?

Tags:

python

pandas

duplicates

drop-duplicates

Code Ninja 2C4U

1 Answers

Chris

Related questions

Recent Activity

Donate For Us