I would like to filter rows containing a duplicate in column X from a dataframe. However, if there are duplicates for a value in X, I would like to give preference to one of them based on the value of another column Y. For example:
import pandas as pd
print pd.__version__
x = pd.DataFrame([
['best', 'a', 'x'],
['worst', 'b', 'y'],
['best', 'c', 'x'],
['worst','d', 'y'],
['best','d', 'y'],
['worst','d', 'y'],
['best','d', 'z'],
['best','d', 'z'],
], columns=['a', 'b', 'c'])
print x
x.drop_duplicates(cols='c', inplace=True)
print x
a b c
0 best a x
1 worst b y
2 best c x
3 worst d y
4 best d y
5 worst d y
6 best d z
7 best d z
a b c
0 best a x
1 worst b y
6 best d z
I would like to give precedence to the duplicate with column a equal to best. Which would give the result:
a b c
0 best a x
4 best d y
6 best d z
Any idea what is the correct way to do this in pandas? Is there a more general way than just sorting the rows such that removing all but the first occurrence of the duplicate does what you want?
I think a more straightforward way is to first sort the DataFrame, then drop duplicates keeping the first entry. This is pretty robust (here, 'a' was a string with two values but you could apply a function that makes an integer column from the string if there were more string values to sort).
x = x.sort_values(['a']).drop_duplicates(cols='c')
I think using two groupby statements will get you what you want. With slightly modified input data:
x = pd.DataFrame([
['best', 'a', 'x'],
['worst', 'b', 'y'],
['best', 'c', 'x'],
['worst','d', 'y'],
['worst','d', 'y'],
['best','d', 'y'],
['best','d', 'z'],
['best','d', 'z'],
], columns=['a', 'b', 'c'])
x.groupby(by=['c']) \
.filter(lambda g: g['a'] == 'best') \
.groupby(by=['b'], as_index=False) \
.first() \
.sort(axis=1) # the columns get out of order in the second groupby
Which returns:
b a c
0 a best x
1 c best x
2 d best z
It's still not 100% clear where this needs to go with your ambiguous example input/output. But I think we're getting close.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With