Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby column suffix in pandas

Given a dataframe:

     val1_aa  val1_bb  val2_aa  val2_bb  val2_cc  val3_cc
id                                                       
100        6        0        3        4        3        1
200        0        1        0        0        1        0
300        5        1        1        0        4        0
400        0        3        1        5        7        1

I'd like to sum all columns in order of the suffix in each column header. My desired output:

     aa  bb  cc
id             
100   9   4   4
200   0   1   1
300   6   1   4
400   1   8   8

How do I get this?


Answer posted below.

like image 270
cs95 Avatar asked Dec 14 '25 17:12

cs95


2 Answers

You can use extract:

df = df.groupby(df.columns.str.extract('_(.*)', expand=False), axis=1).sum()
print (df)
     aa  bb  cc
id             
100   9   4   4
200   0   1   1
300   6   1   4
400   1   8   8

Another solution with MultiIndex:

df.columns = df.columns.str.split('_', expand=True)
print (df)
    val1    val2       val3
      aa bb   aa bb cc   cc
id                         
100    6  0    3  4  3    1
200    0  1    0  0  1    0
300    5  1    1  0  4    0
400    0  3    1  5  7    1

df = df.groupby(level=1, axis=1).sum()
print (df)
     aa  bb  cc
id             
100   9   4   4
200   0   1   1
300   6   1   4
400   1   8   8
like image 82
jezrael Avatar answered Dec 16 '25 19:12

jezrael


You could groupby with lambda function on axis=1

In [4178]: df.groupby(lambda x: x.split('_')[-1], axis=1).sum()  # or x.split('_')[1]
Out[4178]:
     aa  bb  cc
id
100   9   4   4
200   0   1   1
300   6   1   4
400   1   8   8
like image 37
Zero Avatar answered Dec 16 '25 18:12

Zero