Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

define aggfunc for each values column in pandas pivot table

Tags:

Was trying to generate a pivot table with multiple "values" columns. I know I can use aggfunc to aggregate values the way I want to, but what if I don't want to sum or avg both columns but instead I want sum of one column while mean of the other one. So is it possible to do so using pandas?

df = pd.DataFrame({           'A' : ['one', 'one', 'two', 'three'] * 6,           'B' : ['A', 'B', 'C'] * 8,           'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,           'D' : np.random.randn(24),           'E' : np.random.randn(24) }) 

Now this will get a pivot table with sum:

pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=np.sum) 

And this for mean:

pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=np.mean) 

How can I get sum for D and mean for E?

Hope my question is clear enough.

like image 326
VIKASH JAISWAL Avatar asked Nov 21 '13 11:11

VIKASH JAISWAL


People also ask

What is Aggfunc in pivot table pandas?

Pandas has a pivot_table function that applies a pivot on a DataFrame. It also supports aggfunc that defines the statistic to calculate when pivoting (aggfunc is np. mean by default, which calculates the average).

What is the difference between pivot table and Groupby in pandas?

What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.

What is difference between pivot and pivot table in pandas?

Pivot tables are one of Excel's most powerful features. A pivot table allows us to draw insights from data. Pandas provides a similar function called pivot_table() . Pandas pivot_table() is a simple function but can produce very powerful analysis very quickly.


2 Answers

You can apply a specific function to a specific column by passing in a dict.

pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc={'D':np.sum, 'E':np.mean}) 
like image 169
DataSwede Avatar answered Sep 22 '22 03:09

DataSwede


You can concat two DataFrames:

>>> df1 = pd.pivot_table(df, values=['D'], rows=['B'], aggfunc=np.sum) >>> df2 = pd.pivot_table(df, values=['E'], rows=['B'], aggfunc=np.mean) >>> pd.concat((df1, df2), axis=1)           D         E B                     A  1.810847 -0.524178 B  2.762190 -0.443031 C  0.867519  0.078460 

or you can pass list of functions as aggfunc parameter and then reindex:

>>> df3 = pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=[np.sum, np.mean]) >>> df3         sum                mean                     D         E         D         E B                                         A  1.810847 -4.193425  0.226356 -0.524178 B  2.762190 -3.544245  0.345274 -0.443031 C  0.867519  0.627677  0.108440  0.078460 >>> df3 = df3.ix[:, [('sum', 'D'), ('mean','E')]] >>> df3.columns = ['D', 'E'] >>> df3           D         E B                     A  1.810847 -0.524178 B  2.762190 -0.443031 C  0.867519  0.078460 

Alghouth, it would be nice to have an option to defin aggfunc for each column individually. Don't know how it could be done, may be pass into aggfunc dict-like parameter, like {'D':np.mean, 'E':np.sum}.

update Actually, in your case you can pivot by hand:

>>> df.groupby('B').aggregate({'D':np.sum, 'E':np.mean})           E         D B                     A -0.524178  1.810847 B -0.443031  2.762190 C  0.078460  0.867519 
like image 33
Roman Pekar Avatar answered Sep 24 '22 03:09

Roman Pekar