Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas pivot_table with pd.grouper and Margins

Margins=True will not work in Pandas pivot_table when columns is set as pd.grouper datetime. this is my code which works as expected--

p = df.pivot_table(values='Qty', index=['ItemCode', 'LineItem'],columns=pd.Grouper(key = 'Date', freq='W'), aggfunc=np.sum, fill_value=0)

but if I add margins=True, so I get a subtotal, I get error saying:

KeyError: "[TimeGrouper(key='In time', freq=, axis=0, sort=True, closed='left', label='left', how='mean', convention='e', base=0)] not in index"

like image 925
suraksha adhikari gautam Avatar asked Jun 17 '18 01:06

suraksha adhikari gautam


People also ask

Which of the following are pivot_table () used for?

A Pivot Table is used to summarise, sort, reorganise, group, count, total or average data stored in a table. It allows us to transform columns into rows and rows into columns.

What is PD grouper?

class pandas. Grouper(*args, **kwargs)[source] A Grouper allows the user to specify a groupby instruction for an object. This specification will select a column via the key parameter, or if the level and/or axis parameters are given, a level of the index of the target object.

What does pivot_table do in pandas?

Pivot table in pandas is an excellent tool to summarize one or more numeric variable based on two other categorical variables. Pivot tables in pandas are popularly seen in MS Excel files. In python, Pivot tables of pandas dataframes can be created using the command: pandas. pivot_table .

How do you reshape a pandas Dataframe?

You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.


1 Answers

That looks strange! I wonder what causes the pivot table to use the TimeGrouper itself to be used as the index. It's seems like a bug, but I'm not sure. In any case, I think pivottables aren't able to do sub-index margins, so here is a solution with groupby instead:

Sample data

import pandas as pd
from random import randint, choice
from string import ascii_letters, ascii_lowercase

# Say we have a dataframe with 500 rows and 20 different items
df_len = range(500)
item_codes = [''.join([choice(ascii_letters) for _ in range(10)]) for __ in range(20)]
df = pd.DataFrame({
    'ItemCode': [choice(item_codes) for __ in df_len],
    'Date': [pd.datetime.today() - pd.Timedelta(randint(0, 28), 'D') for _ in df_len],
    'Qty': [randint(1,10) for _ in df_len],
    'LineItem': [choice(('a', 'b', 'c')) for _ in df_len],
})

df.head()

     ItemCode                       Date  Qty LineItem
0  IFaEmWGHTJ 2020-05-21 13:29:56.687412    8        a
1  jvLqoLfBcd 2020-05-23 13:29:56.687509    6        a
2  GOPFJEoSUm 2020-05-13 13:29:56.687550    1        a
3  qJqzzgDTaa 2020-05-03 13:29:56.687575    5        a
4  BCvRrgcpFD 2020-05-24 13:29:56.690114    8        b

Solution

res = (df.groupby(['ItemCode', 'LineItem', pd.Grouper(key='Date', freq='W')])['Qty']
       .count()
       .unstack()
       .fillna(0))
res.loc[('column_total', ''), :] = res.sum(axis=0)
res.loc[:,'row_total'] = res.sum(axis=1)

Result

|                      |   2020-05-03 |   2020-05-10 |   2020-05-17 |   2020-05-24 |   2020-05-31 |   row_total |
|:---------------------|-------------:|-------------:|-------------:|-------------:|-------------:|------------:|
| ('CtdClujjRF', 'a')  |            1 |            2 |            2 |            0 |            0 |           5 |
| ('CtdClujjRF', 'b')  |            0 |            3 |            1 |            1 |            1 |           6 |
| ('CtdClujjRF', 'c')  |            1 |            1 |            2 |            2 |            1 |           7 |
| ('DnQcEbHoVL', 'a')  |            0 |            2 |            1 |            1 |            1 |           5 |
| ('DnQcEbHoVL', 'b')  |            1 |            1 |            1 |            2 |            2 |           7 |
                     ...            ...            ...            ...            ...            ...           ...
| ('sxFnkCcSJu', 'c')  |            0 |            2 |            2 |            3 |            0 |           7 |
| ('vOaWNHgOgm', 'a')  |            0 |            5 |            1 |            7 |            1 |          14 |
| ('vOaWNHgOgm', 'b')  |            1 |            0 |            1 |            3 |            4 |           9 |
| ('vOaWNHgOgm', 'c')  |            1 |            2 |            2 |            5 |            1 |          11 |
| ('column_total', '') |           64 |          128 |          115 |          127 |           66 |         500 |
like image 51
Bertil Johannes Ipsen Avatar answered Oct 23 '22 23:10

Bertil Johannes Ipsen