I want to group by several columns (around 40) and apply an aggregated sum. The 40 columns are the last 40 columns of the dataset. I haven't found an elegant way to do this.
I tried to group by all columns after the 'column_before' column.
df.groupby('grouping_variable')['column_before':].sum()
TypeError: unhashable type: 'slice'
The expected result is a dataframe with the last 40 columns aggregated with respect to 'grouping_variable' as a sum
You can achieve it in various ways:
import pandas as pd
df = pd.DataFrame({
'col_to_group': ['A', 'A', 'B', 'B', 'A'],
'col_1': ['a', 'b', 'c', 'a', 'a'],
'col_2': ['x', 'y', 'y', 'y','x'],
'col_3': [.1, .2, .1, .9, .7],
'col_4': [.1, .2, .1, .9, .7],
})
df.groupby('col_to_group')['col_3', 'col_4'].sum()
Just by selecting part of columns list (slice operator):
df.groupby('col_to_group')[df.columns[3:]].sum()
df.groupby('col_to_group')[df.columns[:2:-1]].sum().iloc[:, ::-1]
Using name of column 'col_3'
with iloc
:
df.groupby('col_to_group')[df.loc[:, 'col_3':].columns].sum()
Simlarly like in list selection but with iloc
:
df.groupby('col_to_group')[df.iloc[:, 3:].columns].sum()
df.groupby('col_to_group')[df.iloc[:, :2:-1].columns].sum().iloc[:, ::-1]
Result:
col_3 col_4
col_to_group
A 1.0 1.0
B 1.0 1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With