I have a dataframe that looks like
day type col d_1 d_2 d_3 d_4 d_5...
1 A 1 1 0 1 0
1 A 2 1 0 1 0
2 B 1 1 1 0 0
That is, I have one normal column (col) and many columns prefixed by d_
I need to perform a groupby by day and type and I want to compute the sum of the values in every d_ column for every day-type combination. I also need to perform other aggregation functions on the other columns in my data (such as col
in the example)
I can use:
agg_df=df.groupby(['day','type']).agg({'d_1': 'sum', 'col': 'mean'})
but this computes the sum only for one d_ column. How can I specify all the possible d_ columns in my data?
In other words, I would like to write something like
agg_df=df.groupby(['day','type']).agg({'d_*': 'sum', 'col': 'mean'})
so that the expected output is:
day type col d_1 d_2 d_3 d_4 d_5...
1 A 1.5 2 0 2 0 ...
2 B 1 1 1 0 0
As you can see, col is aggregated by mean, while the d_ columns are summed.
Thanks for your help!
IIUC you need to subset your groupby dataframe with your d_*
columns. You could find that columns with str.contain
and pass it to the groupby dataframe:
cols = df.columns[df.columns.str.contains('(d_)+|col')]
agg_df=df.groupby(['day','type'])[cols].sum()
In [150]: df
Out[150]:
day type col d_1 d_2 d_3 d_4
0 1 A 1 1 0 1 0
1 1 A 2 1 0 1 0
2 2 B 1 1 1 0 0
In [155]: agg_df
Out[155]:
col d_1 d_2 d_3 d_4
day type
1 A 3 2 0 2 0
2 B 1 1 1 0 0
Note: I added the col
columns to the contains
pattern as you requested. You could specify whatever regex expression you want and pass it with |
symbol.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With