Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group by all columns except a few in Python?

I want to group by several columns (around 40) and apply an aggregated sum. The 40 columns are the last 40 columns of the dataset. I haven't found an elegant way to do this.

I tried to group by all columns after the 'column_before' column.

df.groupby('grouping_variable')['column_before':].sum()

TypeError: unhashable type: 'slice'

The expected result is a dataframe with the last 40 columns aggregated with respect to 'grouping_variable' as a sum

like image 399
Andrés Acosta Escobar Avatar asked Oct 19 '25 00:10

Andrés Acosta Escobar


1 Answers

You can achieve it in various ways:

import pandas as pd


df = pd.DataFrame({
     'col_to_group': ['A', 'A', 'B', 'B', 'A'],
     'col_1': ['a', 'b', 'c', 'a', 'a'],
     'col_2': ['x', 'y', 'y', 'y','x'],
     'col_3': [.1, .2, .1, .9, .7],
     'col_4': [.1, .2, .1, .9, .7],
})


df.groupby('col_to_group')['col_3', 'col_4'].sum()

Just by selecting part of columns list (slice operator):

df.groupby('col_to_group')[df.columns[3:]].sum()
df.groupby('col_to_group')[df.columns[:2:-1]].sum().iloc[:, ::-1] 

Using name of column 'col_3' with iloc:

df.groupby('col_to_group')[df.loc[:, 'col_3':].columns].sum()

Simlarly like in list selection but with iloc:

df.groupby('col_to_group')[df.iloc[:, 3:].columns].sum()
df.groupby('col_to_group')[df.iloc[:, :2:-1].columns].sum().iloc[:, ::-1] 

Result:

              col_3  col_4
col_to_group              
A               1.0    1.0
B               1.0    1.0

like image 177
Quant Christo Avatar answered Oct 20 '25 13:10

Quant Christo