Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python / Pandas - Performance - Calculating % of incidence of a value in a column

I have this dataframe called target:

target:

          group
170  64.22-1-00
72   64.22-1-00
121  35.12-3-00
99   64.22-1-00
19   35.12-3-00

I want to create a new column called group_incidence which is ratio of frequency that the group appears in the dataframe. It is calculated like this:

[total number of times that that 'group' appeared in the group column]/len(target.index)

It would look like this:

          group   group_incidence 
170  64.22-1-00               0.6
72   64.22-1-00               0.6
121  35.12-3-00               0.4
99   64.22-1-00               0.6
19   35.12-3-00               0.4

I was able to do that through a for loop, however since that's a large dataframe, it is taking too long. I believe that if I could skip the for loop I would have considerable performance gains.

Is there a way to perform that same operation without going through the for loop?

like image 428
aabujamra Avatar asked Jan 30 '23 03:01

aabujamra


1 Answers

In [112]: df['group_incidence'] = df.groupby('group')['group'].transform('size') / len(df)    

In [113]: df
Out[113]:
          group group_incidence
170  64.22-1-00             0.6
72   64.22-1-00             0.6
121  35.12-3-00             0.4
99   64.22-1-00             0.6
19   35.12-3-00             0.4
like image 140
MaxU - stop WAR against UA Avatar answered Feb 02 '23 08:02

MaxU - stop WAR against UA