I have the below dataframe where there are duplicate rows based on a column "Reason".
No Reason
123 -
123 -
345 Bad Service
345 -
546 Bad Service
546 Poor feedback
I have subsetted these rows based on
df_duplicates = df[df['No'].duplicated() == True]
I am trying to loop over the above subset of rows and filter them only when the "Reason" for the corresponding duplicated row is both missing OR if any one is missing.
Hence the result would be
No Reason
123 -
123 -
345 Bad Service
345 -
I am trying to loop over it and then do it per pair. Not sure whether there is an efficient way to do this in Pandas. Any leads would be appreciated.
filter them only when the "Reason" for the corresponding duplicated row is both missing OR if any one is missing.
You can do:
df[df['Reason'].eq('-').groupby(df['No']).transform('any')]
#or df[df['Reason'].isna().groupby(df['No']).transform('any')]
No Reason
0 123 -
1 123 -
2 345 Bad Service
3 345 -
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With