Pandas pivot table Percent Calculations

Tags:

Given the following data frame and pivot table:

import pandas as pd
df=pd.DataFrame({'A':['x','y','z','x','y','z'],
                 'B':['one','one','one','two','two','two'],
                 'C':[2,18,2,8,2,18]})
df

    A   B       C
0   x   one     2
1   y   one     18
2   z   one     2
3   x   two     8
4   y   two     2
5   z   two     18

table = pd.pivot_table(df, index=['A', 'B'],aggfunc=np.sum)

            C
A   B   
x   one     2
    two     8
y   one     18
    two     2
z   one     2
    two     18

I'd like to add 2 columns to this pivot table; one showing the percent of all values and another for percent within column A like this:

           C    % of Total  % of B
A   B
x   one    2    4%          20%
    two    8    16%         80%
y   one   18    36%         90%
    two    2    4%          10%
z   one    2    4%          10%
    two   18    36%         90%

Extra Credit:

I'd like a bottom summary row which has the sum of column C (it's okay if it also has 100% for the next 2 columns, but nothing is needed for those).

473

asked May 10 '16 20:05

2 Answers

You can use:

table['% of Total'] = (table.C / table.C.sum() * 100).astype(str) + '%'
table['% of B'] = (table.C / table.groupby(level=0).C.transform(sum) * 100).astype(str) + '%'
print table
        C % of Total % of B
A B                        
x one   2       4.0%  20.0%
  two   8      16.0%  80.0%
y one  18      36.0%  90.0%
  two   2       4.0%  10.0%
z one   2       4.0%  10.0%
  two  18      36.0%  90.0%

But with real data I think casting to int is not recommended, better is use round.

Extra Credit:

table['% of Total'] = (table.C / table.C.sum() * 100)
table['% of B'] = (table.C / table.groupby(level=0).C.transform(sum) * 100)
table.loc['total', :] = table.sum().values
print table
              C  % of Total  % of B
A     B                            
x     one   2.0         4.0    20.0
      two   8.0        16.0    80.0
y     one  18.0        36.0    90.0
      two   2.0         4.0    10.0
z     one   2.0         4.0    10.0
      two  18.0        36.0    90.0
total      50.0       100.0   300.0

161

answered Nov 02 '22 17:11

Note that for the particular example in the OP, as pivot_table method's columns parameter is not used, pivot_table is equivalent to groupby as explained here. So an equivalent (and possibly faster) approach to produce the initial pivot table result is

table = df.groupby(['A','B']).sum()

answered Nov 02 '22 19:11

cottontail

Related questions
                            
                                Skip specific set of columns when reading excel frame - pandas
                            
                                How to randomly split a DataFrame into several smaller DataFrames?
                            
                                Improve current implementation of a setInterval
                            
                                Error importing cv2 in python3, Anaconda
                            
                                Unable to get a sha256 hash of a string [duplicate]
                            
                                How can i read a PDF file from inline raw_bytes (not from file)?
                            
                                Azure Python SDK: 'ServicePrincipalCredentials' object has no attribute 'get_token'
                            
                                Python : How to compare strings and ignore white space and special characters
                            
                                Printing File Names
                            
                                Sorting the list of dictionaries in descending order of a particular key [duplicate]
                            
                                Failed to install wsgiref on Python 3
                            
                                Converting mongoengine objects to JSON
                            
                                Kivy error, [CRITICAL] [Text ] unable to find any valuable text provider (python 3.6.1) (windows 10)
                            
                                Check for string in "response.content" raising "TypeError: a bytes-like object is required, not 'str'"
                            
                                Reading PASCAL VOC annotations in python
                            
                                LambdaType vs FunctionType
                            
                                How do I use CSV Writers with GZIP files in Python 3?
                            
                                importing module causes TypeError: module.__init__() takes at most 2 arguments (3 given)
                            
                                From request import PandaRequest ImportError: No module named 'request'
                            
                                Can’t download youtube video

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas pivot table Percent Calculations

Tags:

python-3.x

pandas

pivot-table

percentage

Dance Party2

People also ask

2 Answers

jezrael

cottontail

Recent Activity

Donate For Us