Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas group by operations on a data frame

Tags:

python

pandas

I have a pandas data frame like the one below.

UsrId   JobNos
 1       4
 1       56
 2       23 
 2       55
 2       41
 2       5
 3       78
 1       25
 3       1

I group by the data frame based on the UsrId. The grouped data frame will conceptually look like below.

UsrId   JobNos
  1    [4,56,25]
  2    [23,55,41,5]
  3    [78,1]

Now, I'm looking for an in-build API that will give me the UsrId with the maximum job count. For the above example, UsrId-2 has the maximum count.

UPDATE: Instead of the UsrID with maximum job count, I want n UserIds with maximum job counts. For the above example, if n=2 then the output is [2,1]. Can this be done?

like image 634
Anirudh Jayakumar Avatar asked May 22 '13 05:05

Anirudh Jayakumar


1 Answers

Something like df.groupby('UsrId').JobNos.sum().idxmax() should do it:

In [1]: import pandas as pd

In [2]: from StringIO import StringIO

In [3]: data = """UsrId   JobNos
   ...:  1       4
   ...:  1       56
   ...:  2       23 
   ...:  2       55
   ...:  2       41
   ...:  2       5
   ...:  3       78
   ...:  1       25
   ...:  3       1"""

In [4]: df = pd.read_csv(StringIO(data), sep='\s+')

In [5]: grouped = df.groupby('UsrId')

In [6]: grouped.JobNos.sum()
Out[6]: 
UsrId
1         85
2        124
3         79
Name: JobNos

In [7]: grouped.JobNos.sum().idxmax()
Out[7]: 2

If you want your results based on the number of items in each group:

In [8]: grouped.size()
Out[8]: 
UsrId
1        3
2        4
3        2

In [9]: grouped.size().idxmax()
Out[9]: 2

Update: To get ordered results you can use the .order method:

In [10]: grouped.JobNos.sum().order(ascending=False)
Out[10]: 
UsrId
2        124
1         85
3         79
Name: JobNos
like image 155
root Avatar answered Oct 14 '22 23:10

root