Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get percentages of a column based off of another column but with different categories

I have the following Pandas Series:

                    Count
Pclass  Survived    
1       0            80
        1           136
2       0            97
        1            87
3       0           372
        1           119

But I want something like this:

                    Count   Percentage
Pclass  Survived       
1       0           80      37.0
        1           136     63.0
2       0           97      52.7
        1           87      47.3
3       0           372     75.8
        1           119     24.2

I want to calculate the percentages based on only the counts in each category of Pclass (not the whole sum of counts). It would be great if these percentages are calculated using only the Count column.

So far, what I did was I summed up the counts according to Pclass and used .repeat() to duplicate the values and tried to concatenate it to the original Series, which failed spectacularly.

        Count
Pclass  
1       216
1       216
2       184
2       184
3       491
3       491

My idea was to use this repeat column as the denominators in calculating the percentages, like this:

80 / 216 * 100 = 37.0%

and then remove the repeat column after the percentages are calculated. Seems so simple but I can't seem to make it work. Any help is appreciated.

like image 205
luminare Avatar asked Sep 27 '18 05:09

luminare


2 Answers

div and sum with level=0

df.assign(Pct=df.div(df.sum(level=0), level=0).round(2) * 100)

                 Count   Pct
Pclass Survived             
1      0            80  37.0
       1           136  63.0
2      0            97  53.0
       1            87  47.0
3      0           372  76.0
       1           119  24.0
like image 116
piRSquared Avatar answered Sep 19 '22 22:09

piRSquared


Use GroupBy.transform for return Series with same size as original DataFrame, divide by div, multiple by mul and if necessary round:

#if input is Series create one column DataFrame
df = s.to_frame('Count')

s = df.groupby('Pclass')['Count'].transform('sum')
#if use oldier pandas version
#s = df.groupby(level='Pclass')['Count'].transform('sum')

df['Percentage'] = df['Count'].div(s).mul(100).round()
print (df)
                 Count  Percentage
Pclass Survived                   
1      0            80        37.0
       1           136        63.0
2      0            97        53.0
       1            87        47.0
3      0           372        76.0
       1           119        24.0

EDIT:

dont you have to forward fill on PClass ?

It is not necessary, because default in MultiIndex are not shown, but if set multi_sparse to False is possible verify repeating values:

with pd.option_context('display.multi_sparse', False):
    print (df)
                 Count
Pclass Survived       
1      0            80
1      1           136
2      0            97
2      1            87
3      0           372
3      1           119
like image 43
jezrael Avatar answered Sep 17 '22 22:09

jezrael