Margins=True
will not work in Pandas pivot_table when columns is set as pd.grouper datetime
. this is my code which works as expected--
p = df.pivot_table(values='Qty', index=['ItemCode', 'LineItem'],columns=pd.Grouper(key = 'Date', freq='W'), aggfunc=np.sum, fill_value=0)
but if I add margins=True
, so I get a subtotal, I get error saying:
KeyError: "[TimeGrouper(key='In time', freq=, axis=0, sort=True, closed='left', label='left', how='mean', convention='e', base=0)] not in index"
A Pivot Table is used to summarise, sort, reorganise, group, count, total or average data stored in a table. It allows us to transform columns into rows and rows into columns.
class pandas. Grouper(*args, **kwargs)[source] A Grouper allows the user to specify a groupby instruction for an object. This specification will select a column via the key parameter, or if the level and/or axis parameters are given, a level of the index of the target object.
Pivot table in pandas is an excellent tool to summarize one or more numeric variable based on two other categorical variables. Pivot tables in pandas are popularly seen in MS Excel files. In python, Pivot tables of pandas dataframes can be created using the command: pandas. pivot_table .
You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.
That looks strange! I wonder what causes the pivot table to use the TimeGrouper itself to be used as the index. It's seems like a bug, but I'm not sure. In any case, I think pivottables aren't able to do sub-index margins, so here is a solution with groupby instead:
Sample data
import pandas as pd
from random import randint, choice
from string import ascii_letters, ascii_lowercase
# Say we have a dataframe with 500 rows and 20 different items
df_len = range(500)
item_codes = [''.join([choice(ascii_letters) for _ in range(10)]) for __ in range(20)]
df = pd.DataFrame({
'ItemCode': [choice(item_codes) for __ in df_len],
'Date': [pd.datetime.today() - pd.Timedelta(randint(0, 28), 'D') for _ in df_len],
'Qty': [randint(1,10) for _ in df_len],
'LineItem': [choice(('a', 'b', 'c')) for _ in df_len],
})
df.head()
ItemCode Date Qty LineItem
0 IFaEmWGHTJ 2020-05-21 13:29:56.687412 8 a
1 jvLqoLfBcd 2020-05-23 13:29:56.687509 6 a
2 GOPFJEoSUm 2020-05-13 13:29:56.687550 1 a
3 qJqzzgDTaa 2020-05-03 13:29:56.687575 5 a
4 BCvRrgcpFD 2020-05-24 13:29:56.690114 8 b
Solution
res = (df.groupby(['ItemCode', 'LineItem', pd.Grouper(key='Date', freq='W')])['Qty']
.count()
.unstack()
.fillna(0))
res.loc[('column_total', ''), :] = res.sum(axis=0)
res.loc[:,'row_total'] = res.sum(axis=1)
Result
| | 2020-05-03 | 2020-05-10 | 2020-05-17 | 2020-05-24 | 2020-05-31 | row_total |
|:---------------------|-------------:|-------------:|-------------:|-------------:|-------------:|------------:|
| ('CtdClujjRF', 'a') | 1 | 2 | 2 | 0 | 0 | 5 |
| ('CtdClujjRF', 'b') | 0 | 3 | 1 | 1 | 1 | 6 |
| ('CtdClujjRF', 'c') | 1 | 1 | 2 | 2 | 1 | 7 |
| ('DnQcEbHoVL', 'a') | 0 | 2 | 1 | 1 | 1 | 5 |
| ('DnQcEbHoVL', 'b') | 1 | 1 | 1 | 2 | 2 | 7 |
... ... ... ... ... ... ...
| ('sxFnkCcSJu', 'c') | 0 | 2 | 2 | 3 | 0 | 7 |
| ('vOaWNHgOgm', 'a') | 0 | 5 | 1 | 7 | 1 | 14 |
| ('vOaWNHgOgm', 'b') | 1 | 0 | 1 | 3 | 4 | 9 |
| ('vOaWNHgOgm', 'c') | 1 | 2 | 2 | 5 | 1 | 11 |
| ('column_total', '') | 64 | 128 | 115 | 127 | 66 | 500 |
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With