Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas: groupby one level of MultiIndex but remain other levels instead

Tags:

python

pandas

Suppose that I have a DataFrame:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.arange(0, 24).reshape((3, 8)))
df.columns = pd.MultiIndex.from_arrays([
    ['a1', 'a1', 'a2', 'a2', 'b1', 'b1', 'b2', 'b2'],
    ['4th', '5th', '4th', '5th', '4th', '5th', '4th', '5th']
])
print(df)

output:

       a1      a2      b1      b2    
  4th 5th 4th 5th 4th 5th 4th 5th
0   0   1   2   3   4   5   6   7
1   8   9  10  11  12  13  14  15
2  16  17  18  19  20  21  22  23

I wanna group by a dict:

label_dict = {'a1': 'A', 'a2': 'A', 'b1': 'B', 'b2': 'B'}
res = df.groupby(label_dict, axis=1, level=0).sum()
print(res)

output:

    A   B
0   6  22
1  38  54
2  70  86

but what I want is:

    A   A   B   B
  4th 5th 4th 5th
0   2   4  10  12
1  18  21  26  28
2  34  36  42  44

Is there any idea? Thanks!

like image 938
Alvin Liu Avatar asked May 31 '18 12:05

Alvin Liu


People also ask

How do I drop one level of MultiIndex pandas?

Drop Level Using MultiIndex.droplevel() to drop columns level. When you have Multi-level columns DataFrame. columns return MultiIndex object and use droplevel() on this object to drop level.

Does Groupby preserve index?

The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.

Does pandas Groupby keep order?

Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.

How do you split a Groupby in pandas?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.


1 Answers

Use rename with sum by both levels in MultiIndex in columns:

label_dict = {'a1': 'A', 'a2': 'A', 'b1': 'B', 'b2': 'B'}

res = df.rename(columns=label_dict, level=0).sum(level=[0,1], axis=1)
#alternative with groupby
#res = df.rename(columns=label_dict, level=0).groupby(level=[0,1], axis=1).sum()
print(res)
    A       B    
  4th 5th 4th 5th
0   2   4  10  12
1  18  20  26  28
2  34  36  42  44
like image 102
jezrael Avatar answered Nov 10 '22 00:11

jezrael