With a single-indexed dataframe, the columns are available in the group by object:
df1 = pd.DataFrame({'a':[2,2,4,4], 'b': [5,6,7,8]})
df1.groupby('a')['b'].sum() ->
a
2 11
4 15
But in a MultiIndex dataframe when not grouping by level, the columns are no longer accessible in the group by object
df = pd.concat([df1, df1], keys=['c', 'd'], axis=1)
df ->
c d
a b a b
0 2 5 2 5
1 2 6 2 6
2 4 7 4 7
3 4 8 4 8
df.groupby([('c','a')])[('c','b')].sum() ->
KeyError: "Columns not found: 'b', 'c'"
As a workaround, this works but it's not efficient since it doesn't use the cpythonized aggregator, not to mention it's awkward looking.
df.groupby([('c','a')]).apply(lambda df: df[('c', 'b')].sum())
Is there a way to access MultiIndex column in groupby object that I missed?
Adding a comma after your ('c','b')
tuple seems to work:
df.groupby([('c','a')])[('c','b'),].sum()
I'm guessing that without the comma, pandas is just interpreting them as separate items.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With