Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas pivot table selecting rows with maximum values

I have pandas dataframe as:

df

Id      Name        CaseId       Value 
82      A1          case1.01     37.71 
1558    A3          case1.01     27.71 
82      A1          case1.06     29.54 
1558    A3          case1.06     29.54 
82      A1          case1.11     12.09 
1558    A3          case1.11     32.09 
82      A1          case1.16     33.35 
1558    A3          case1.16     33.35 

For each Id, Name pair I need to select the CaseId with maximum value.

i.e. I am seeking the following output:

Id      Name        CaseId       Value 
82      A1          case1.01     37.71
1558    A3          case1.16     33.35

I tried the following:

import pandas as pd
pd.pivot_table(df, index=['Id', 'Name'], columns=['CaseId'], values=['Value'], aggfunc=[np.max])['amax']

But all it does is for each CaseId as column it gives maximum value and not the results that I am seeking above.

like image 906
Zanam Avatar asked May 21 '18 15:05

Zanam


2 Answers

sort_values + drop_duplicates

df.sort_values('Value').drop_duplicates(['Id'],keep='last')
Out[93]: 
     Id Name    CaseId  Value
7  1558   A3  case1.16  33.35
0    82   A1  case1.01  37.71

Since we post same time , adding more method

df.sort_values('Value').groupby('Id').tail(1)
Out[98]: 
     Id Name    CaseId  Value
7  1558   A3  case1.16  33.35
0    82   A1  case1.01  37.71
like image 143
BENY Avatar answered Oct 17 '22 14:10

BENY


This should work:

df = df.sort_values('Value', ascending=False).drop_duplicates('Id').sort_index()

Output:

     Id Name    CaseId  Value
0    82   A1  case1.01  37.71
7  1558   A3  case1.16  33.35
like image 41
Joe Avatar answered Oct 17 '22 13:10

Joe