Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access MultiIndex column after groupby in pandas?

With a single-indexed dataframe, the columns are available in the group by object:

df1 = pd.DataFrame({'a':[2,2,4,4], 'b': [5,6,7,8]})
df1.groupby('a')['b'].sum() -> 

a
2    11
4    15

But in a MultiIndex dataframe when not grouping by level, the columns are no longer accessible in the group by object

df = pd.concat([df1, df1], keys=['c', 'd'], axis=1)
df -> 

   c     d
   a  b  a  b
0  2  5  2  5
1  2  6  2  6
2  4  7  4  7
3  4  8  4  8

df.groupby([('c','a')])[('c','b')].sum() -> 
KeyError: "Columns not found: 'b', 'c'"

As a workaround, this works but it's not efficient since it doesn't use the cpythonized aggregator, not to mention it's awkward looking.

df.groupby([('c','a')]).apply(lambda df: df[('c', 'b')].sum())

Is there a way to access MultiIndex column in groupby object that I missed?

like image 997
polyglot Avatar asked Aug 02 '16 18:08

polyglot


1 Answers

Adding a comma after your ('c','b') tuple seems to work:

df.groupby([('c','a')])[('c','b'),].sum()

I'm guessing that without the comma, pandas is just interpreting them as separate items.

like image 191
root Avatar answered Nov 07 '22 08:11

root