Given a dataframe:
val1_aa val1_bb val2_aa val2_bb val2_cc val3_cc
id
100 6 0 3 4 3 1
200 0 1 0 0 1 0
300 5 1 1 0 4 0
400 0 3 1 5 7 1
I'd like to sum all columns in order of the suffix in each column header. My desired output:
aa bb cc
id
100 9 4 4
200 0 1 1
300 6 1 4
400 1 8 8
How do I get this?
Answer posted below.
You can use extract:
df = df.groupby(df.columns.str.extract('_(.*)', expand=False), axis=1).sum()
print (df)
aa bb cc
id
100 9 4 4
200 0 1 1
300 6 1 4
400 1 8 8
Another solution with MultiIndex:
df.columns = df.columns.str.split('_', expand=True)
print (df)
val1 val2 val3
aa bb aa bb cc cc
id
100 6 0 3 4 3 1
200 0 1 0 0 1 0
300 5 1 1 0 4 0
400 0 3 1 5 7 1
df = df.groupby(level=1, axis=1).sum()
print (df)
aa bb cc
id
100 9 4 4
200 0 1 1
300 6 1 4
400 1 8 8
You could groupby with lambda function on axis=1
In [4178]: df.groupby(lambda x: x.split('_')[-1], axis=1).sum() # or x.split('_')[1]
Out[4178]:
aa bb cc
id
100 9 4 4
200 0 1 1
300 6 1 4
400 1 8 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With