I'm new to Python and I would like to use Python to replicate a common excel task. If such a question has already been answered, please let me know. I've been unable to find it. I have the following pandas dataframe (data):
Date Stage SubStage Value
12/31/2015 1.00 a 0.896882891
1/1/2016 1.00 a 0.0458843
1/2/2016 1.00 a 0.126805588
1/3/2016 1.00 b 0.615824461
1/4/2016 1.00 b 0.245092069
1/5/2016 1.00 c 0.121936318
1/6/2016 1.00 c 0.170198128
1/7/2016 1.00 c 0.735872415
1/8/2016 1.00 c 0.542361912
1/4/2016 2.00 a 0.723769247
1/5/2016 2.00 a 0.305570257
1/6/2016 2.00 b 0.47461605
1/7/2016 2.00 b 0.173702623
1/8/2016 2.00 c 0.969260251
1/9/2016 2.00 c 0.017170798
In excel, I can use a pivot table to produce the following:
It seems reasonable to do the following in python:
data.pivot(index='Date',
columns=['Stage', 'SubStage'],
values='Value')
But that produces:
KeyError: 'Level Stage not found'
What gives?
In python, Pivot tables of pandas dataframes can be created using the command: pandas. pivot_table . You can aggregate a numeric column as a cross tabulation against two categorical columns. In this article, you'll see how to create pivot tables in pandas and understand its parameters with worked out examples.
Pivot Table is a table of summarized data. Pivot Chart is the visual representation of the corresponding pivot table. You can create only a pivot table. If you create a pivot chart, the corresponding pivot table will be auto-generated.
The aggfunc argument of pivot_table takes a function or list of functions but not dict. aggfunc : function, default numpy.mean, or list of functions If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves ...
You want .pivot_table
, not .pivot
.
import pandas
from io import StringIO
x = StringIO("""\
Date Stage SubStage Value
12/31/2015 1.00 a 0.896882891
1/1/2016 1.00 a 0.0458843
1/2/2016 1.00 a 0.126805588
1/3/2016 1.00 b 0.615824461
1/4/2016 1.00 b 0.245092069
1/5/2016 1.00 c 0.121936318
1/6/2016 1.00 c 0.170198128
1/7/2016 1.00 c 0.735872415
1/8/2016 1.00 c 0.542361912
1/4/2016 2.00 a 0.723769247
1/5/2016 2.00 a 0.305570257
1/6/2016 2.00 b 0.47461605
1/7/2016 2.00 b 0.173702623
1/8/2016 2.00 c 0.969260251
1/9/2016 2.00 c 0.017170798
""")
df = pandas.read_table(x, sep='\s+')
xtab = df.pivot_table(index='Date', columns=['Stage','SubStage'], values='Value')
print(xtab.to_string(na_rep='--'))
And that gives me:
Stage 1.0 2.0
SubStage a b c a b c
Date
1/1/2016 0.045884 -- -- -- -- --
1/2/2016 0.126806 -- -- -- -- --
1/3/2016 -- 0.615824 -- -- -- --
1/4/2016 -- 0.245092 -- 0.723769 -- --
1/5/2016 -- -- 0.121936 0.305570 -- --
1/6/2016 -- -- 0.170198 -- 0.474616 --
1/7/2016 -- -- 0.735872 -- 0.173703 --
1/8/2016 -- -- 0.542362 -- -- 0.969260
1/9/2016 -- -- -- -- -- 0.017171
12/31/2015 0.896883 -- -- -- -- --
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With