Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get percentage based on groupby in Pandas?

Tags:

python

pandas

I have a pandas DataFrame like this:

subject bool Count
1   False   329232  
1   True    73896   
2   False   268338  
2   True    76424   
3   False   186167  
3   True    27078   
4   False   172417  
4   True    113268  

I would like to turn Count into percents for each subject group. So for example, row 1 would be 329232 / (329232 + 73896) = 0.816 and row 2 would be 73896/ (329232 + 73896) = 0.183. Then the total would change for group 2, and so on.

Is this possible to do by a groupby? I tried iterating over the rows with little success.

like image 950
max Avatar asked Feb 15 '19 22:02

max


2 Answers

This works for me:


import numpy as np
import pandas as pd

# data
df = pd.DataFrame({'subject': [1, 1, 2, 2, 3, 3, 4, 4],
                   'bool': [False, True, False, True, False, True, False, True],
                   'Count': [329232, 73896, 268338, 76424, 186167, 27078, 172417, 113268]})

# answer
df['Per_Subject_Count_Pct'] = df['Count'].div(
    df.groupby('subject')['Count'].transform(lambda x: x.sum()))
print(df)

Gives:

   subject   bool   Count  Per_Subject_Count_Pct
0        1  False  329232               0.816693
1        1   True   73896               0.183307
2        2  False  268338               0.778328
3        2   True   76424               0.221672
4        3  False  186167               0.873019
5        3   True   27078               0.126981
6        4  False  172417               0.603521
7        4   True  113268               0.396479
like image 138
BhishanPoudel Avatar answered Oct 09 '22 01:10

BhishanPoudel


My solution would be like this:

Importing relevant libraries

import pandas as pd
import numpy as np

Creating a Dataframe df

d = {'subject':[1,1,2,2,3,3],'bool':[False,True,False,True,False,True],
'count':[329232,73896,268338,76424,186167,27078]}
df = pd.DataFrame(d)

Using groupby and reset_index

table_sum= df.groupby('subject').sum().reset_index()[['subject','count']]

Zip the groupby output and make it asdictionary and get the frequency using map

look_1 = (dict(zip(table_sum['subject'],table_sum['count'])))
df['cu_sum'] = df['subject'].map(look_1)
df['relative_frequency'] = df['count']/df['cu_sum']

Output

print(df)

       subject   bool   count  cu_sum  relative_frequency
    0        1  False  329232  403128            0.816693
    1        1   True   73896  403128            0.183307
    2        2  False  268338  344762            0.778328
    3        2   True   76424  344762            0.221672
    4        3  False  186167  213245            0.873019
    5        3   True   27078  213245            0.126981
like image 38
Ryan Ni Avatar answered Oct 09 '22 00:10

Ryan Ni