Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas not removing duplicates

Tags:

python

pandas

In the following script

import pandas as pd

def start():
    df_dict = {"A": [1,2,3,3,4], "B": [1,2,2,3,4]}
    df = pd.DataFrame(df_dict)

    df.drop_duplicates(inplace = True, keep = "last")

    print(df)

if __name__ == "__main__":
    start()

The duplicates in df are not removed. What could be the reason

Current output:

   A  B
0  1  1
1  2  2
2  3  2
3  3  3
4  4  4

Expected output:

   A  B
0  1  1
1  2  2
3  3  3
4  4  4
like image 963
Hussein Fawzy Avatar asked Feb 04 '26 16:02

Hussein Fawzy


1 Answers

The .drop_duplicates() method looks at duplicate rows for all columns of the dataframe, so you need to use .drop_duplicates() while subsetting for each of the two columns, then get the intersection of these two subset dataframes (inner merge). Instead of printing out the resulting dataframe, it's probably more in your interest to have your function return the dataframe.

import pandas as pd

def start():
    df_dict = {"A": [1,2,3,3,4], "B": [1,2,2,3,4]}
    df = pd.DataFrame(df_dict)

    # drop duplicates within each column
    df1 = df.drop_duplicates(subset='A', keep='last')
    df2 = df.drop_duplicates(subset='B', keep='last')

    return pd.merge(df1,df2,how='inner')

if __name__ == "__main__":
    result = start() 

Output:

>>> result
   A  B
0  1  1
1  3  3
2  4  4
like image 182
Derek O Avatar answered Feb 06 '26 05:02

Derek O



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!