Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sum values grouped by a categorical column in pandas?

I have data which has a categorical column that groups the data and other columns likes this in a dataframe df.

id      subid      value
1       10         1.5
1       20         2.5
1       30         7.0 
2       10         12.5
2       40         5

What I need is a column that has the average value for each subid within each id. For example df could be:

id      subid      value     id_sum    proportion
1       10         1.5       11.0      0.136
1       20         2.5       11.0      0.227
1       30         7.0       11.0      0.636
2       10         12.5      17.5      0.714
2       40         5         17.5      0.285

Now, I tried getting the id_sum column by doing:

df['id_sum'] = df.groupby('id')['value'].sum()

But this does not seem to work as hoped. My end goal is to get the proportion column. What is the correct way of getting that?

like image 627
sfactor Avatar asked Oct 12 '16 12:10

sfactor


1 Answers

here we go

df['id_sum'] = df.groupby('id')['value'].transform('sum')
df['proportion'] = df['value'] / df['id_sum']
like image 198
ℕʘʘḆḽḘ Avatar answered Sep 27 '22 23:09

ℕʘʘḆḽḘ