I have a dataframe like this:
A B C
12 true 1
12 true 1
3 nan 2
3 nan 3
I would like to drop all rows where the value of column A is duplicate but only if the value of column B is 'true'.
The resulting dataframe I have in mind is:
A B C
12 true 1
3 nan 2
3 nan 3
I tried using: df.loc[df['B']=='true'].drop_duplicates('A', inplace=True, keep='first')
but it doesn't seem to work.
Thanks for your help!
Pandas drop_duplicates function has an argument to specify which columns we need to use to identify duplicates. For example, to remove duplicate rows using the column 'continent', we can use the argument “subset” and specify the column name we want to identify duplicate.
Pandas DataFrame drop_duplicates() Method The drop_duplicates() method removes duplicate rows. Use the subset parameter if only some specified columns should be considered when looking for duplicates.
You can sue pd.concat
split the df by B
df=pd.concat([df.loc[df.B!=True],df.loc[df.B==True].drop_duplicates(['A'],keep='first')]).sort_index()
df
Out[1593]:
A B C
0 12 True 1
2 3 NaN 2
3 3 NaN 3
df[df.B.ne(True) | ~df.A.duplicated()]
A B C
0 12 True 1
2 3 NaN 2
3 3 NaN 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With