Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter duplicate rows based on a condition in Pandas

I have the below dataframe where there are duplicate rows based on a column "Reason".

No   Reason  
123  -
123  -
345  Bad Service
345  -
546  Bad Service
546  Poor feedback

I have subsetted these rows based on

df_duplicates = df[df['No'].duplicated() == True]

I am trying to loop over the above subset of rows and filter them only when the "Reason" for the corresponding duplicated row is both missing OR if any one is missing.

Hence the result would be

No   Reason  
123  -
123  -
345  Bad Service
345  -

I am trying to loop over it and then do it per pair. Not sure whether there is an efficient way to do this in Pandas. Any leads would be appreciated.

like image 651
user3447653 Avatar asked Mar 02 '26 18:03

user3447653


1 Answers

filter them only when the "Reason" for the corresponding duplicated row is both missing OR if any one is missing.

You can do:

df[df['Reason'].eq('-').groupby(df['No']).transform('any')]
#or df[df['Reason'].isna().groupby(df['No']).transform('any')]

    No       Reason
0  123            -
1  123            -
2  345  Bad Service
3  345            -
like image 50
anky Avatar answered Mar 04 '26 06:03

anky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!