Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Dataframe select rows based on max values in one of the columns

Tags:

python

pandas

I have a dataframe in python (many rows, 2 columns). I want to modify the DF with a unique value in column 1 based on the largest value in column 2 (column 2 is sorted in ascending order if that helps). I could probably write a loop but would prefer a one or two line solution. Thanks.

Ex.

ID         Value
100       11
100       14
100       16
200       10
200       20
200       30
300       45
400        0
400       25

desired result

100       16
200       30
300       45
400       25
like image 898
jim g Avatar asked Mar 28 '17 13:03

jim g


2 Answers

You want to groupby on 'a' column and then get the index of the max value using idxmax and use these indices to index the orig df:

In [12]:
df.loc[df.groupby('a')['b'].idxmax()]

Out[12]:
     a   b
2  100  16
5  200  30
6  300  45
8  400  25
like image 60
EdChum Avatar answered Nov 12 '22 02:11

EdChum


In case you don't need the original index but just the highest values per ID, you can use groupby and max:

print(df.groupby("ID").max())

     Value
ID  
100     16
200     30
300     45
400     25
like image 41
pansen Avatar answered Nov 12 '22 02:11

pansen