How to use two different functions within crosstab/pivot_table in pandas?

Tags:

Using pandas, is it possible to compute a single cross-tabulation (or pivot table) containing values calculated from two different functions?

import pandas as pd
import numpy as np

c1 = np.repeat(['a','b'], [50, 50], axis=0)
c2 = list('xy'*50)
c3 = np.repeat(['G1','G2'], [50, 50], axis=0)
np.random.shuffle(c3)
c4=np.repeat([1,2], [50,50],axis=0)
np.random.shuffle(c4)
val = np.random.rand(100)

df = pd.DataFrame({'c1':c1, 'c2':c2, 'c3':c3, 'c4':c4, 'val':val})

frequencyTable = pd.crosstab([df.c1,df.c2],[df.c3,df.c4])
meanVal = pd.crosstab([df.c1,df.c2],[df.c3,df.c4],values=df.val,aggfunc=np.mean)

So, both the rows and the columns are the same in both tables, but what I'd really like is a table with both frequencies and mean values:

c3           G1                       G2          
c4     1              2              1              2
c1 c2  freq val       freq val       freq val       freq val         
a  x   6    0.624931  5    0.582268  8    0.528231  6    0.362804
   y   7    0.493890  8    0.465741  3    0.613126  7    0.312894
b  x   9    0.488255  5    0.804015  6    0.722640  5    0.369480
   y   6    0.462653  4    0.506791  5    0.583695  10   0.517954

676

asked Sep 04 '13 17:09

HappyPy

1 Answers

You can give a list of functions:

pd.crosstab([df.c1,df.c2], [df.c3,df.c4], values=df.val, aggfunc=[len, np.mean])

If you want the table as shown in your question, you will have to rearrange the levels a bit:

In [42]: table = pd.crosstab([df.c1,df.c2], [df.c3,df.c4], values=df.val, aggfunc=[len, np.mean])

In [43]: table
Out[43]: 
       len                mean                              
c3      G1     G2           G1                  G2          
c4       1  2   1  2         1         2         1         2
c1 c2                                                       
a  x     4  6   8  7  0.303036  0.414474  0.624900  0.425234
   y     5  5   8  7  0.543363  0.480419  0.583499  0.637657
b  x    10  6   4  5  0.400279  0.436929  0.442924  0.287572
   y     6  8   5  6  0.400427  0.623319  0.764506  0.408708

In [44]: table.reorder_levels([1, 2, 0], axis=1).sort_index(axis=1)
Out[44]: 
c3      G1                            G2                         
c4       1              2              1              2          
       len      mean  len      mean  len      mean  len      mean
c1 c2                                                            
a  x     4  0.303036    6  0.414474    8  0.624900    7  0.425234
   y     5  0.543363    5  0.480419    8  0.583499    7  0.637657
b  x    10  0.400279    6  0.436929    4  0.442924    5  0.287572
   y     6  0.400427    8  0.623319    5  0.764506    6  0.408708

181

answered Sep 27 '22 20:09

joris

Related questions
                            
                                askopenfilename handling cancel on dialogue
                            
                                does calling a shell command from within a scripting language slow down performance?
                            
                                django serializers to json - custom json output format
                            
                                How do I compare 2D lists for equality in Python?
                            
                                How to show a window that was hidden using "withdraw" method?
                            
                                Using pandas to read text file with leading whitespace gives a NaN column
                            
                                Why is creating a range from 0 to log(len(list), 2) so slow?
                            
                                Why Cant I Click an Element in Selenium?
                            
                                Dealing with trying to read a file that might not exist
                            
                                Parsing data from text file
                            
                                python 3.3: struct.pack won't accept strings
                            
                                Exclude one or more items from pandas Series
                            
                                how do I track how many users visit my website
                            
                                python unittest assertRaises
                            
                                Concatenate Columns as Index in Pandas
                            
                                Set and Get @property method in Python by string variable
                            
                                Iterate over all pairwise combinations of numpy array columns
                            
                                Speeding up matplotlib scatter plots
                            
                                Coerce in django forms
                            
                                How to get a response of multiple objects using rest_framework and Django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use two different functions within crosstab/pivot_table in pandas?

Tags:

python

merge

pandas

pivot-table

crosstab

HappyPy

People also ask

1 Answers

joris

Recent Activity

Donate For Us