Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas divide row value by aggregated sum with a condition set by other cell

Hi Hoping to get some help, I have two columns Dataframe df as;

Source ID
1      2
2      3
1      2
1      2
1      3
3      1

My intention is to group the Source and divide the ID cell by total based on the grouped Source and attach this to the orginial dataframe so the new column would look like;

   Source ID  ID_new
    1      2  2/9
    2      3  3/3
    1      2  2/9
    1      2  2/9
    1      3  3/9
    3      1  3/1

I've gotten as far as;

df.groupby('Source ID')['ID'].sum()

to get the total for ID but Im not sure where to go next.

like image 403
user3191569 Avatar asked Feb 07 '23 04:02

user3191569


1 Answers

try this:

In [79]: df.assign(ID_new=df.ID/df.groupby('Source').ID.transform('sum'))
Out[79]:
   Source  ID    ID_new
0       1   2  0.222222
1       2   3  1.000000
2       1   2  0.222222
3       1   2  0.222222
4       1   3  0.333333
5       3   1  1.000000

if you need it as a new persistent column you can do it as @jezrael proposed in the comment:

In [81]: df['ID_new'] = df.ID/df.groupby('Source').ID.transform('sum')

In [82]: df
Out[82]:
   Source  ID    ID_new
0       1   2  0.222222
1       2   3  1.000000
2       1   2  0.222222
3       1   2  0.222222
4       1   3  0.333333
5       3   1  1.000000
like image 65
MaxU - stop WAR against UA Avatar answered Feb 08 '23 17:02

MaxU - stop WAR against UA