Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas keep duplicated with highest value

I have data similar to:

id value duplicate
a   200  yes
a   12   yes
b   42   yes
c   12   no
b   532  yes
b   21   yes
...

To track the duplicates I use df['duplicate'] = df.duplicated('id', keep=False) However, I would like to keep the ones with the highest value and either mark or drop the other duplicates. Any suggestions?

like image 824
As3adTintin Avatar asked Oct 29 '15 20:10

As3adTintin


People also ask

Can Pandas index have duplicates?

Indicate duplicate index values. Duplicated values are indicated as True values in the resulting array. Either all duplicates, all except the first, or all except the last occurrence of duplicates can be indicated. The value or values in a set of duplicates to mark as missing.

Does Pandas drop duplicates keep first?

Only consider certain columns for identifying duplicates, by default use all of the columns. Determines which duplicates (if any) to keep. - first : Drop duplicates except for the first occurrence.

How do I select duplicates in Pandas?

You can use the duplicated() function to find duplicate values in a pandas DataFrame.


1 Answers

Ah I don't know why I didn't think of this first. df.sort(['id', 'value']) df['is_duplicated'] = df.duplicated('id', keep='first')

sorry!

like image 75
As3adTintin Avatar answered Sep 18 '22 01:09

As3adTintin