When I drop <code>John</code> as duplicate specifying 'name' as the column name: <pre class="prettyprint"><code>import pandas as pd data = {'name':['Bill','Steve','John','John','John'], 'age':[21,28,22,30,29]} df = pd.DataFrame(data) df = df.drop_duplicates('name') </code></pre> pandas drops all matching entities leaving the left-most: <pre class="prettyprint"><code> age name 0 21 Bill 1 28 Steve 2 22 John </code></pre> Instead I would like to keep the row where John's age is the highest (in this example it is the age 30. How to achieve this?

Try this: <pre class="prettyprint"><code>In [75]: df Out[75]: age name 0 21 Bill 1 28 Steve 2 22 John 3 30 John 4 29 John In [76]: df.sort_values('age').drop_duplicates('name', keep='last') Out[76]: age name 0 21 Bill 1 28 Steve 3 30 John </code></pre> or this depending on your goals: <pre class="prettyprint"><code>In [77]: df.drop_duplicates('name', keep='last') Out[77]: age name 0 21 Bill 1 28 Steve 4 29 John </code></pre>

How to drop duplicate from DataFrame taking into account value of another column

Tags:

When I drop John as duplicate specifying 'name' as the column name:

import pandas as pd   
data = {'name':['Bill','Steve','John','John','John'], 'age':[21,28,22,30,29]}
df = pd.DataFrame(data)
df = df.drop_duplicates('name')

pandas drops all matching entities leaving the left-most:

   age   name
0   21   Bill
1   28  Steve
2   22   John

Instead I would like to keep the row where John's age is the highest (in this example it is the age 30. How to achieve this?

626

asked Oct 16 '16 23:10

alphanumeric

1 Answers

Try this:

In [75]: df
Out[75]:
   age   name
0   21   Bill
1   28  Steve
2   22   John
3   30   John
4   29   John

In [76]: df.sort_values('age').drop_duplicates('name', keep='last')
Out[76]:
   age   name
0   21   Bill
1   28  Steve
3   30   John

or this depending on your goals:

In [77]: df.drop_duplicates('name', keep='last')
Out[77]:
   age   name
0   21   Bill
1   28  Steve
4   29   John

131

answered Sep 26 '22 16:09

MaxU - stop WAR against UA

Related questions
                            
                                Count workers in business with branches
                            
                                Coffeescript array.sort(a,b) generates failing JS
                            
                                Why getting Memory Error? Python
                            
                                Put list into dict with first row header as keys
                            
                                G Suite identity provider for an AWS driven browser based App
                            
                                How to replace each array element by 4 copies in Python?
                            
                                Integrating Auth0 authentication with existing user database
                            
                                Azure web app - Additional deployment slots affect production slot?
                            
                                CheckedListBox DataSource suddenly not working
                            
                                Php openSLL Encryption - to prevent special characters
                            
                                Adsense responsive: no slot size for availableWidth=0
                            
                                SQL Server database - How to handle a table which COULD be linked to any number of other tables?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With