Is there any way of making a sum on the columns after grouping in pandas data frame? For example I have the following data frame:
ID W_1 W_2 W_3
1 0.1 0.2 0.3
1 0.2 0.4 0.5
2 0.3 0.3 0.2
2 0.1 0.3 0.4
2 0.2 0.0 0.5
1 0.5 0.3 0.2
1 0.4 0.2 0.1
I want to have an extra column called "my_sum" that sums the first row in all columns (W_1, W_2, W_3). The output would be something like this:
ID W_1 W_2 W_3 my_sum
1 0.1 0.2 0.3 0.6
1 0.2 0.4 0.5 1.1
2 0.3 0.3 0.2 0.8
2 0.1 0.3 0.4 0.8
2 0.2 0.0 0.5 0.7
1 0.5 0.3 0.2 1.0
1 0.4 0.2 0.1 0.7
I tred the following:
df['my_sum'] = df.groupby('ID')['W_1','W_1','W_1'].transform(sum,axis=1)
but this sums all entries of just W_1. The documentation mentions the axis parmeter, but I am not sure why it is not effective.
I looked into this question and also this, but they are different from what I want.
The thing that remains the same is the .sum(1)
. Here are some inventive alternatives to answers already posted.
df.select_dtypes
df['my_sum'] = df.select_dtypes(float).sum(1)
df
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
df.iloc
df['my_sum'] = df.iloc[:, 1:].sum(1)
df
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
This is obnoxiously exploiting your data.
df['my_sum'] = df[df < 1].sum(1)
df
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
DataFrame.sum
, or numpy.sum
Filter on the column names using str.contains
:
df.iloc[:, df.columns.str.contains('W_')].sum(1)
df
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
Alternatively try summing on a numpy array directly, for performance:
df['my_sum'] = df.values[:, 1:].sum(1)
df
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
In [7]: df['my_sum'] = df.drop('ID',1).sum(axis=1)
In [8]: df
Out[8]:
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
or:
In [9]: df['my_sum'] = df.filter(regex='^W_\d+').sum(axis=1)
In [10]: df
Out[10]:
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With