Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use two different functions within crosstab/pivot_table in pandas?

Using pandas, is it possible to compute a single cross-tabulation (or pivot table) containing values calculated from two different functions?

import pandas as pd
import numpy as np

c1 = np.repeat(['a','b'], [50, 50], axis=0)
c2 = list('xy'*50)
c3 = np.repeat(['G1','G2'], [50, 50], axis=0)
np.random.shuffle(c3)
c4=np.repeat([1,2], [50,50],axis=0)
np.random.shuffle(c4)
val = np.random.rand(100)

df = pd.DataFrame({'c1':c1, 'c2':c2, 'c3':c3, 'c4':c4, 'val':val})

frequencyTable = pd.crosstab([df.c1,df.c2],[df.c3,df.c4])
meanVal = pd.crosstab([df.c1,df.c2],[df.c3,df.c4],values=df.val,aggfunc=np.mean)

So, both the rows and the columns are the same in both tables, but what I'd really like is a table with both frequencies and mean values:

c3           G1                       G2          
c4     1              2              1              2
c1 c2  freq val       freq val       freq val       freq val         
a  x   6    0.624931  5    0.582268  8    0.528231  6    0.362804
   y   7    0.493890  8    0.465741  3    0.613126  7    0.312894
b  x   9    0.488255  5    0.804015  6    0.722640  5    0.369480
   y   6    0.462653  4    0.506791  5    0.583695  10   0.517954
like image 676
HappyPy Avatar asked Sep 04 '13 17:09

HappyPy


People also ask

What is the difference between Groupby and pivot_table in pandas?

What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.

What is the difference between pivot and pivot_table in pandas?

pivot() will error with a ValueError: Index contains duplicate entries, cannot reshape if the index/column pair is not unique. In this case, consider using pivot_table() which is a generalization of pivot that can handle duplicate values for one index/column pair.

What is the functionality of pivot_table function of pandas?

Pandas DataFrame: pivot_table() function The pivot_table() function is used to create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

How do you do cross tabulation in pandas?

Compute a simple cross tabulation of two (or more) factors. By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed. Values to group by in the rows. Values to group by in the columns.


1 Answers

You can give a list of functions:

pd.crosstab([df.c1,df.c2], [df.c3,df.c4], values=df.val, aggfunc=[len, np.mean])

If you want the table as shown in your question, you will have to rearrange the levels a bit:

In [42]: table = pd.crosstab([df.c1,df.c2], [df.c3,df.c4], values=df.val, aggfunc=[len, np.mean])

In [43]: table
Out[43]: 
       len                mean                              
c3      G1     G2           G1                  G2          
c4       1  2   1  2         1         2         1         2
c1 c2                                                       
a  x     4  6   8  7  0.303036  0.414474  0.624900  0.425234
   y     5  5   8  7  0.543363  0.480419  0.583499  0.637657
b  x    10  6   4  5  0.400279  0.436929  0.442924  0.287572
   y     6  8   5  6  0.400427  0.623319  0.764506  0.408708

In [44]: table.reorder_levels([1, 2, 0], axis=1).sort_index(axis=1)
Out[44]: 
c3      G1                            G2                         
c4       1              2              1              2          
       len      mean  len      mean  len      mean  len      mean
c1 c2                                                            
a  x     4  0.303036    6  0.414474    8  0.624900    7  0.425234
   y     5  0.543363    5  0.480419    8  0.583499    7  0.637657
b  x    10  0.400279    6  0.436929    4  0.442924    5  0.287572
   y     6  0.400427    8  0.623319    5  0.764506    6  0.408708
like image 181
joris Avatar answered Sep 27 '22 20:09

joris