Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas groupby aggregate with grand total in the bottom

Here is my code:

import StringIO
from pandas import *
import numpy as np

df = read_csv(StringIO.StringIO('''Col1 Col2    A   B
A   D   1   6
A   E   2   7
B   D   3   8
B   E   4   9
C   D   5   19'''), delimiter='\t')


df['buc1'] = cut(df['A'], bins = [0, 2, 6, 8])

aggFunc = {'A': sum,
           'B': np.average
           }

After running:

df.groupby(['buc1']).agg(aggFunc)

I get:

         A     B
buc1            
(0, 2]   3   6.5
(2, 6]  12  12.0
(6, 8] NaN   NaN

My questions are:

  1. How do I get rid of the bottom (6, 8] bucket, given that there are no values there?
  2. How do I add a grand total row at the bottom, as in a pivot table in Excel?

For instance:

buc1    A   B
(0, 2]  3   6.5
(2, 6]  12  12
Total   15  9.8

Note that the total row for the second column would be the average, not the sum.

like image 936
Jason Avatar asked Oct 27 '15 17:10

Jason


1 Answers

To just drop the na records, you can use the .dropna() dataframe method.

df['buc1'] = df['buc1'].astype(object) 
result = df.groupby(['buc1']).agg(aggFunc).dropna()
result


         A     B
buc1            
(0, 2]   3   6.5
(2, 6]  12  12.0

As for the marginal totals, in theory this should work:

result_alternative = pd.pivot_table(df,index='buc1',
                                    aggfunc=aggFunc,
                                    values=['A','B'],
                                    margins=True)
result_alternative

         A     B
buc1            
(0, 2]   3   6.5
(2, 6]  12  12.0
All     15   9.8

But it throws an error related to , which I think is a bug that should be fixed so it may work on your machine (although I am running version 0.17)

That same link has a workaround related to the categorical variable--just cast it as an object first, and the above call should be good.

EDIT:

If you want to stay with the groupby semantics, you can add a row total like so:

rowtotal = result.sum()
rowtotal.name = 'rowtotal'
result.append(rowtotal)
like image 109
ako Avatar answered Oct 01 '22 03:10

ako