I have this dataframe called target:
target:
group
170 64.22-1-00
72 64.22-1-00
121 35.12-3-00
99 64.22-1-00
19 35.12-3-00
I want to create a new column called group_incidence which is ratio of frequency that the group appears in the dataframe. It is calculated like this:
[total number of times that that 'group' appeared in the group column]/len(target.index)
It would look like this:
group group_incidence
170 64.22-1-00 0.6
72 64.22-1-00 0.6
121 35.12-3-00 0.4
99 64.22-1-00 0.6
19 35.12-3-00 0.4
I was able to do that through a for
loop, however since that's a large dataframe, it is taking too long. I believe that if I could skip the for loop I would have considerable performance gains.
Is there a way to perform that same operation without going through the for loop?
In [112]: df['group_incidence'] = df.groupby('group')['group'].transform('size') / len(df)
In [113]: df
Out[113]:
group group_incidence
170 64.22-1-00 0.6
72 64.22-1-00 0.6
121 35.12-3-00 0.4
99 64.22-1-00 0.6
19 35.12-3-00 0.4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With