Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to drop duplicate from DataFrame taking into account value of another column

Tags:

When I drop John as duplicate specifying 'name' as the column name:

import pandas as pd   
data = {'name':['Bill','Steve','John','John','John'], 'age':[21,28,22,30,29]}
df = pd.DataFrame(data)
df = df.drop_duplicates('name')

pandas drops all matching entities leaving the left-most:

   age   name
0   21   Bill
1   28  Steve
2   22   John

Instead I would like to keep the row where John's age is the highest (in this example it is the age 30. How to achieve this?

like image 626
alphanumeric Avatar asked Oct 16 '16 23:10

alphanumeric


People also ask

How do you drop duplicates in a DataFrame based on one column?

To remove duplicates of only one or a subset of columns, specify subset as the individual column or list of columns that should be unique. To do this conditional on a different column's value, you can sort_values(colname) and specify keep equals either first or last .

How do you drop duplicates in Pandas with conditions?

Pandas drop_duplicates() Function Syntax keep: allowed values are {'first', 'last', False}, default 'first'. If 'first', duplicate rows except the first one is deleted. If 'last', duplicate rows except the last one is deleted. If False, all the duplicate rows are deleted.


1 Answers

Try this:

In [75]: df
Out[75]:
   age   name
0   21   Bill
1   28  Steve
2   22   John
3   30   John
4   29   John

In [76]: df.sort_values('age').drop_duplicates('name', keep='last')
Out[76]:
   age   name
0   21   Bill
1   28  Steve
3   30   John

or this depending on your goals:

In [77]: df.drop_duplicates('name', keep='last')
Out[77]:
   age   name
0   21   Bill
1   28  Steve
4   29   John
like image 131
MaxU - stop WAR against UA Avatar answered Sep 26 '22 16:09

MaxU - stop WAR against UA