Pandas not removing duplicates

Question

In the following script

import pandas as pd

def start():
    df_dict = {"A": [1,2,3,3,4], "B": [1,2,2,3,4]}
    df = pd.DataFrame(df_dict)

    df.drop_duplicates(inplace = True, keep = "last")

    print(df)

if __name__ == "__main__":
    start()

The duplicates in df are not removed. What could be the reason

Current output:

Expected output:

Derek O · Accepted Answer

The .drop_duplicates() method looks at duplicate rows for all columns of the dataframe, so you need to use .drop_duplicates() while subsetting for each of the two columns, then get the intersection of these two subset dataframes (inner merge). Instead of printing out the resulting dataframe, it's probably more in your interest to have your function return the dataframe.

import pandas as pd

def start():
    df_dict = {"A": [1,2,3,3,4], "B": [1,2,2,3,4]}
    df = pd.DataFrame(df_dict)

    # drop duplicates within each column
    df1 = df.drop_duplicates(subset='A', keep='last')
    df2 = df.drop_duplicates(subset='B', keep='last')

    return pd.merge(df1,df2,how='inner')

if __name__ == "__main__":
    result = start()

Output:

>>> result
   A  B
0  1  1
1  3  3
2  4  4

Pandas not removing duplicates

Tags:

python

pandas

Hussein Fawzy

1 Answers

Derek O

Recent Activity

Donate For Us

Pandas not removing duplicates

Tags:

python

pandas

Hussein Fawzy

1 Answers

Derek O

Related questions

Recent Activity

Donate For Us