Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract row with maximum value in a group pandas dataframe

Tags:

python

pandas

A similar question is asked here: Python : Getting the Row which has the max value in groups using groupby

However, I just need one record per group even if there are more than one record with maximum value in that group.

In the example below, I need one record for "s2". For me it doesn't matter which one.

>>> df = DataFrame({'Sp':['a','b','c','d','e','f'], 'Mt':['s1', 's1', 's2','s2','s2','s3'], 'Value':[1,2,3,4,5,6], 'count':[3,2,5,10,10,6]})
>>> df
   Mt Sp  Value  count
0  s1  a      1      3
1  s1  b      2      2
2  s2  c      3      5
3  s2  d      4     10
4  s2  e      5     10
5  s3  f      6      6
>>> idx = df.groupby(['Mt'])['count'].transform(max) == df['count']
>>> df[idx]
   Mt Sp  Value  count
0  s1  a      1      3
3  s2  d      4     10
4  s2  e      5     10
5  s3  f      6      6
>>> 
like image 416
user1140126 Avatar asked Nov 06 '13 17:11

user1140126


People also ask

How do you get max rows in pandas?

Find Maximum Element in Pandas DataFrame's RowIf the axis equals to 0, the max() method will find the max element of each column. On the other hand, if the axis equals to 1, the max() will find the max element of each row.


2 Answers

You can use first

In [14]: df.groupby('Mt').first()
Out[14]: 
   Sp  Value  count
Mt                 
s1  a      1      3
s2  c      3      5
s3  f      6      6

Update

Set as_index=False to achieve your goal

In [28]: df.groupby('Mt', as_index=False).first()
Out[28]: 
   Mt Sp  Value  count
0  s1  a      1      3
1  s2  c      3      5
2  s3  f      6      6 

Update Again

Sorry for misunderstanding what you mean. You can sort it first if you want the one with max count in a group

In [196]: df.sort('count', ascending=False).groupby('Mt', as_index=False).first()
Out[196]: 
   Mt Sp  Value  count
0  s1  a      1      3
1  s2  e      5     10
2  s3  f      6      6
like image 94
waitingkuo Avatar answered Oct 11 '22 15:10

waitingkuo


To get first occurence of maximum count you can use pandas.DataFrame.idxmax() function:

>>> df.iloc[df.groupby(['Mt']).apply(lambda x: x['count'].idxmax())]
   Mt Sp  Value  count
0  s1  a      1      3
3  s2  d      4     10
5  s3  f      6      6
like image 19
Roman Pekar Avatar answered Oct 11 '22 15:10

Roman Pekar