I have a problem when using a groupby
/apply
chain on a MultiIndex
DataFrame
: The resulting data frame contains the grouped level(s) twice!
Example data frame:
df = pandas.DataFrame(dict(a=[1, 2, 3, 4, 5, 6], b=[6, 7, 8, 9, 0, 0]),
index=pandas.MultiIndex.from_product([[1, 2], [3, 4, 5]]))
a b
1 3 1 6
4 2 7
5 3 8
2 3 4 9
4 5 0
5 6 0
In this example, I simply sum over the rows (I actually have another function, but one that also accepts and returns a dataframe):
df.groupby(level=0).apply(lambda x: x.sum(axis=1))
And it gives me the following result:
1 1 3 7
4 9
5 11
2 2 3 13
4 5
5 6
dtype: int64
So now I have three index levels, with the one grouped by doubled. When I group by both levels, the whole multiindex in doubled:
df.groupby(level=[0,1]).apply(lambda x: x.sum(axis=1))
1 3 1 3 7
4 1 4 9
5 1 5 11
2 3 2 3 13
4 2 4 5
5 2 5 6
dtype: int64
If I set as_index=False
, I still get another index level, containing ascending numbers:
df.groupby(level=[0,1], as_index=False).apply(lambda x: x.sum(axis=1))
0 1 3 7
1 1 4 9
2 1 5 11
3 2 3 13
4 2 4 5
5 2 5 6
dtype: int64
Is this intended behavior? How can I avoid that another index level is created? Do I have to remove it by hand every time I do a groupby/apply operation?
There's also the group_keys
option which leaves only the MultiIndex
of the Series
returned by the .sum(axis=1)
applied to the individual chunks:
df = pandas.DataFrame(dict(a=[1, 2, 3, 4, 5, 6], b=[6, 7, 8, 9, 0, 0]),
index=pandas.MultiIndex.from_product([[1, 2], [3, 4, 5]]))
print(df.groupby(level=0, group_keys=False).apply(lambda x: x.sum(axis=1)))
1 3 7
4 9
5 11
2 3 13
4 5
5 6
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With