Python / Pandas - Performance - Calculating % of incidence of a value in a column

Question

I have this dataframe called target:

target:

          group
170  64.22-1-00
72   64.22-1-00
121  35.12-3-00
99   64.22-1-00
19   35.12-3-00

I want to create a new column called group_incidence which is ratio of frequency that the group appears in the dataframe. It is calculated like this:

[total number of times that that 'group' appeared in the group column]/len(target.index)

It would look like this:

          group   group_incidence 
170  64.22-1-00               0.6
72   64.22-1-00               0.6
121  35.12-3-00               0.4
99   64.22-1-00               0.6
19   35.12-3-00               0.4

I was able to do that through a for loop, however since that's a large dataframe, it is taking too long. I believe that if I could skip the for loop I would have considerable performance gains.

Is there a way to perform that same operation without going through the for loop?

MaxU - stop WAR against UA · Accepted Answer

In [112]: df['group_incidence'] = df.groupby('group')['group'].transform('size') / len(df)    

In [113]: df
Out[113]:
          group group_incidence
170  64.22-1-00             0.6
72   64.22-1-00             0.6
121  35.12-3-00             0.4
99   64.22-1-00             0.6
19   35.12-3-00             0.4

Python / Pandas - Performance - Calculating % of incidence of a value in a column

Tags:

performance

python

pandas

vectorization

aabujamra

1 Answers

MaxU - stop WAR against UA

Recent Activity

Donate For Us

Python / Pandas - Performance - Calculating % of incidence of a value in a column

Tags:

performance

python

pandas

vectorization

aabujamra

1 Answers

MaxU - stop WAR against UA

Related questions

Recent Activity

Donate For Us