Given the following data frame and pivot table:
import pandas as pd
df=pd.DataFrame({'A':['x','y','z','x','y','z'],
'B':['one','one','one','two','two','two'],
'C':[2,18,2,8,2,18]})
df
A B C
0 x one 2
1 y one 18
2 z one 2
3 x two 8
4 y two 2
5 z two 18
table = pd.pivot_table(df, index=['A', 'B'],aggfunc=np.sum)
C
A B
x one 2
two 8
y one 18
two 2
z one 2
two 18
I'd like to add 2 columns to this pivot table; one showing the percent of all values and another for percent within column A like this:
C % of Total % of B
A B
x one 2 4% 20%
two 8 16% 80%
y one 18 36% 90%
two 2 4% 10%
z one 2 4% 10%
two 18 36% 90%
Extra Credit:
I'd like a bottom summary row which has the sum of column C (it's okay if it also has 100% for the next 2 columns, but nothing is needed for those).
You will have the pivot table with the Sales for the Items for each Month. To calculate % of Sales for each month, you need to do the following: Click on pivot builder the entry Sum of Sales and select Value Field Settings. In the Value Field Settings window, on the Show Values As tab, choose % of Column Total.
Right-click anywhere in the % of wins column in the pivot table. Select Value Field Settings > Show Values As > Number Format > Percentage. Click OK twice.
You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.
You can use:
table['% of Total'] = (table.C / table.C.sum() * 100).astype(str) + '%'
table['% of B'] = (table.C / table.groupby(level=0).C.transform(sum) * 100).astype(str) + '%'
print table
C % of Total % of B
A B
x one 2 4.0% 20.0%
two 8 16.0% 80.0%
y one 18 36.0% 90.0%
two 2 4.0% 10.0%
z one 2 4.0% 10.0%
two 18 36.0% 90.0%
But with real data I think casting to int
is not recommended, better is use round
.
Extra Credit:
table['% of Total'] = (table.C / table.C.sum() * 100)
table['% of B'] = (table.C / table.groupby(level=0).C.transform(sum) * 100)
table.loc['total', :] = table.sum().values
print table
C % of Total % of B
A B
x one 2.0 4.0 20.0
two 8.0 16.0 80.0
y one 18.0 36.0 90.0
two 2.0 4.0 10.0
z one 2.0 4.0 10.0
two 18.0 36.0 90.0
total 50.0 100.0 300.0
If you want to chain the methods to assign the new columns to pivot_table()
method to put in a pipeline, you can do so using assign()
.
Moreover, you can add the totals as a new row using the margins
parameter of pivot_table
.
table = (
df
.pivot_table(index=['A', 'B'], aggfunc=np.sum, margins=True, margins_name='Total')
.assign(**{
# must exclude the last row (which are the Totals) for sum and group-specific sum
'% of Total': lambda x: x['C'] / x.iloc[:-1]['C'].sum() * 100,
'% of B': lambda x: x['C'] / x.iloc[:-1].groupby(level='A')['C'].transform('sum') * 100
})
)
Note that for the particular example in the OP, as pivot_table
method's columns
parameter is not used, pivot_table
is equivalent to groupby
as explained here. So an equivalent (and possibly faster) approach to produce the initial pivot table result is
table = df.groupby(['A','B']).sum()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With