Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop duplicates keeping the row with the highest value in another column

Tags:

python

pandas

a = [['John', 'Mary', 'John'], [10,22,50]]
df1 = pd.DataFrame(a, columns=['Name', 'Count'])

Given a data frame like this I want to compare all similar string values of "Name" against the "Count" value to determine the highest. I'm not sure how to do this in a dataframe in Python.

Ex: In the case above the Answer would be:

  • Name Count
  • Mary 22
  • John 50

The lower value John 10 has been dropped (I only want to see the highest value of "Count" based on the same value for "Name").

In SQL it would be something like a Select Case query (wherein I select the Case where Name == Name & Count > Count recursively to determine the highest number. Or a For loop for each name, but as I understand loops in DataFrames is a bad idea due to the nature of the object.

Is there a way to do this with a DF in Python? I could create a new data frame with each variable (one with Only John and then get the highest value (df.value()[:1] or similar. But as I have many hundreds of unique entries that seems like a terrible solution. :D

like image 771
Kafka Avatar asked Jul 21 '18 20:07

Kafka


People also ask

How can I retain the row position when removing duplicate values?

Click Data > Filter to disable Filter, and remove the formulas as you need. You can see all duplicates have been removed and the rest of values are kept in the row.

How do you drop duplicate rows in pandas based on a column value?

You can use DataFrame. drop_duplicates() without any arguments to drop rows with the same values on all columns. It takes defaults values subset=None and keep='first' .


1 Answers

Either sort_values and drop_duplicates,

df1.sort_values('Count').drop_duplicates('Name', keep='last')

   Name  Count
1  Mary     22
2  John     50

Or, like miradulo said, groupby and max.

df1.groupby('Name')['Count'].max().reset_index()

   Name  Count
0  John     50
1  Mary     22
like image 92
cs95 Avatar answered Sep 28 '22 20:09

cs95