Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add subtotal columns in pandas with multi-index

Tags:

python

pandas

I have a dataframe with a 3-level deep multi-index on the columns. I would like to compute subtotals across rows (sum(axis=1)) where I sum across one of the levels while preserving the others. I think I know how to do this using the level keyword argument of pd.DataFrame.sum. However, I'm having trouble thinking of how to incorporate the result of this sum back into the original table.

Setup:

import numpy as np
import pandas as pd
from itertools import product

np.random.seed(0)

colors = ['red', 'green']
shapes = ['square', 'circle']
obsnum = range(5)

rows = list(product(colors, shapes, obsnum))
idx = pd.MultiIndex.from_tuples(rows)
idx.names = ['color', 'shape', 'obsnum']

df = pd.DataFrame({'attr1': np.random.randn(len(rows)), 
                   'attr2': 100 * np.random.randn(len(rows))},
                  index=idx)

df.columns.names = ['attribute']

df = df.unstack(['color', 'shape'])

Gives a nice frame like so:

Original frame

Say I wanted to reduce the shape level. I could run:

tots = df.sum(axis=1, level=['attribute', 'color'])

to get my totals like so:

totals

Once I have this, I'd like to tack it on to the original frame. I think I can do this in a somewhat cumbersome way:

tots = df.sum(axis=1, level=['attribute', 'color'])
newcols = pd.MultiIndex.from_tuples(list((i[0], i[1], 'sum(shape)') for i in tots.columns))
tots.columns = newcols
bigframe = pd.concat([df, tots], axis=1).sort_index(axis=1)

aggregated

Is there a more natural way to do this?

like image 577
8one6 Avatar asked Jan 02 '14 18:01

8one6


1 Answers

Here is a way without loops:

s = df.sum(axis=1, level=[0,1]).T
s["shape"] = "sum(shape)"
s.set_index("shape", append=True, inplace=True)
df.combine_first(s.T)

The trick is to use the transposed sum. So we can insert another column (i.e. row) with the name of the additional level, which we name exactly like the one we summed over. This column can be converted to a level in the index with set_index. Then we combine df with the transposed sum. If the summed level is not the last one you might need some level reordering.

like image 169
cronos Avatar answered Sep 29 '22 15:09

cronos