Groupby column suffix in pandas

Question

Given a dataframe:

     val1_aa  val1_bb  val2_aa  val2_bb  val2_cc  val3_cc
id                                                       
100        6        0        3        4        3        1
200        0        1        0        0        1        0
300        5        1        1        0        4        0
400        0        3        1        5        7        1

I'd like to sum all columns in order of the suffix in each column header. My desired output:

     aa  bb  cc
id             
100   9   4   4
200   0   1   1
300   6   1   4
400   1   8   8

How do I get this?

_{Answer posted below.}

jezrael · Accepted Answer

You can use extract:

df = df.groupby(df.columns.str.extract('_(.*)', expand=False), axis=1).sum()
print (df)
     aa  bb  cc
id             
100   9   4   4
200   0   1   1
300   6   1   4
400   1   8   8

Another solution with MultiIndex:

df.columns = df.columns.str.split('_', expand=True)
print (df)
    val1    val2       val3
      aa bb   aa bb cc   cc
id                         
100    6  0    3  4  3    1
200    0  1    0  0  1    0
300    5  1    1  0  4    0
400    0  3    1  5  7    1

df = df.groupby(level=1, axis=1).sum()
print (df)
     aa  bb  cc
id             
100   9   4   4
200   0   1   1
300   6   1   4
400   1   8   8

Zero · Answer

You could groupby with lambda function on axis=1

In [4178]: df.groupby(lambda x: x.split('_')[-1], axis=1).sum()  # or x.split('_')[1]
Out[4178]:
     aa  bb  cc
id
100   9   4   4
200   0   1   1
300   6   1   4
400   1   8   8

Groupby column suffix in pandas

Tags:

python

pandas

dataframe

group-by

pandas-groupby

cs95

2 Answers

jezrael

Zero

Recent Activity

Donate For Us

Groupby column suffix in pandas

Tags:

python

pandas

dataframe

group-by

pandas-groupby

cs95

2 Answers

jezrael

Zero

Related questions

Recent Activity

Donate For Us