Filter duplicate rows based on a condition in Pandas

Question

I have the below dataframe where there are duplicate rows based on a column "Reason".

No   Reason  
123  -
123  -
345  Bad Service
345  -
546  Bad Service
546  Poor feedback

I have subsetted these rows based on

df_duplicates = df[df['No'].duplicated() == True]

I am trying to loop over the above subset of rows and filter them only when the "Reason" for the corresponding duplicated row is both missing OR if any one is missing.

Hence the result would be

No   Reason  
123  -
123  -
345  Bad Service
345  -

I am trying to loop over it and then do it per pair. Not sure whether there is an efficient way to do this in Pandas. Any leads would be appreciated.

anky · Accepted Answer

filter them only when the "Reason" for the corresponding duplicated row is both missing OR if any one is missing.

You can do:

df[df['Reason'].eq('-').groupby(df['No']).transform('any')]
#or df[df['Reason'].isna().groupby(df['No']).transform('any')]

    No       Reason
0  123            -
1  123            -
2  345  Bad Service
3  345            -

Filter duplicate rows based on a condition in Pandas

Tags:

pandas

duplicates

user3447653

1 Answers

anky

Recent Activity

Donate For Us

Filter duplicate rows based on a condition in Pandas

Tags:

pandas

duplicates

user3447653

1 Answers

anky

Related questions

Recent Activity

Donate For Us