Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to sum across many columns with pandas groupby?

Tags:

python

pandas

I have a dataframe that looks like

day  type  col  d_1  d_2  d_3  d_4  d_5...
1    A     1    1    0    1    0
1    A     2    1    0    1    0
2    B     1    1    1    0    0

That is, I have one normal column (col) and many columns prefixed by d_

I need to perform a groupby by day and type and I want to compute the sum of the values in every d_ column for every day-type combination. I also need to perform other aggregation functions on the other columns in my data (such as col in the example)

I can use:

agg_df=df.groupby(['day','type']).agg({'d_1': 'sum', 'col': 'mean'})

but this computes the sum only for one d_ column. How can I specify all the possible d_ columns in my data?

In other words, I would like to write something like

agg_df=df.groupby(['day','type']).agg({'d_*': 'sum', 'col': 'mean'})

so that the expected output is:

day  type  col  d_1  d_2  d_3  d_4  d_5...
1    A     1.5  2    0    2    0    ...
2    B     1    1    1    0    0

As you can see, col is aggregated by mean, while the d_ columns are summed.

Thanks for your help!

like image 759
ℕʘʘḆḽḘ Avatar asked Feb 08 '16 12:02

ℕʘʘḆḽḘ


1 Answers

IIUC you need to subset your groupby dataframe with your d_* columns. You could find that columns with str.contain and pass it to the groupby dataframe:

cols = df.columns[df.columns.str.contains('(d_)+|col')]
agg_df=df.groupby(['day','type'])[cols].sum()


In [150]: df
Out[150]:
   day type  col  d_1  d_2  d_3  d_4
0    1    A    1    1    0    1    0
1    1    A    2    1    0    1    0
2    2    B    1    1    1    0    0

In [155]: agg_df
Out[155]:
          col  d_1  d_2  d_3  d_4
day type
1   A       3    2    0    2    0
2   B       1    1    1    0    0

Note: I added the col columns to the contains pattern as you requested. You could specify whatever regex expression you want and pass it with | symbol.

like image 191
Anton Protopopov Avatar answered Sep 30 '22 13:09

Anton Protopopov