Suppose that I have a DataFrame:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(0, 24).reshape((3, 8)))
df.columns = pd.MultiIndex.from_arrays([
['a1', 'a1', 'a2', 'a2', 'b1', 'b1', 'b2', 'b2'],
['4th', '5th', '4th', '5th', '4th', '5th', '4th', '5th']
])
print(df)
output:
a1 a2 b1 b2
4th 5th 4th 5th 4th 5th 4th 5th
0 0 1 2 3 4 5 6 7
1 8 9 10 11 12 13 14 15
2 16 17 18 19 20 21 22 23
I wanna group by a dict:
label_dict = {'a1': 'A', 'a2': 'A', 'b1': 'B', 'b2': 'B'}
res = df.groupby(label_dict, axis=1, level=0).sum()
print(res)
output:
A B
0 6 22
1 38 54
2 70 86
but what I want is:
A A B B
4th 5th 4th 5th
0 2 4 10 12
1 18 21 26 28
2 34 36 42 44
Is there any idea? Thanks!
Drop Level Using MultiIndex.droplevel() to drop columns level. When you have Multi-level columns DataFrame. columns return MultiIndex object and use droplevel() on this object to drop level.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.
Use rename
with sum
by both levels in MultiIndex
in columns:
label_dict = {'a1': 'A', 'a2': 'A', 'b1': 'B', 'b2': 'B'}
res = df.rename(columns=label_dict, level=0).sum(level=[0,1], axis=1)
#alternative with groupby
#res = df.rename(columns=label_dict, level=0).groupby(level=[0,1], axis=1).sum()
print(res)
A B
4th 5th 4th 5th
0 2 4 10 12
1 18 20 26 28
2 34 36 42 44
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With