How to use groupby in pandas to calculate a percentage / proportion total based on a criteria in another column

Tags:

I'm trying to work out how to use the groupby function in pandas to work out the proportions of values per year with a given Yes/No criteria.

For example, I have a dataframe called names:

  Name  Number  Year   Sex Criteria
0  name1     789  1998  Male      N
1  name1     688  1999  Male      N
2  name1     639  2000  Male      N
3  name2     551  1998  Male      Y
4  name2     499  1999  Male      Y

I can use

namesgrouped = names.groupby(["Sex", "Year", "Criteria"]).sum()

to get:

                   Number
Sex    Year      Criteria
Male   1998 N        14507
            Y         2308
       1999 N        14119
            Y         2331

and so on. I would like the 'Number Criteria' column to show the % of the total for each gender and year - so instead of N = 14507 and Y = 2308 for 1998 above I'd have N = 86.27% and Y = 13.73%.

Can anyone advise how to do this?

567

asked May 02 '16 17:05

fuzzy_logic_77

1 Answers

This question is a direct extension of the suggested duplicate. Borrowing from the accepted answer, this will work:

In [46]: namesgrouped.groupby(level=[0, 1]).apply(lambda g: g / g.sum())
Out[46]: 
                      Number
Sex  Year Criteria          
Male 1998 N         0.588806
          Y         0.411194
     1999 N         0.579612
          Y         0.420388
     2000 N         1.000000

Edit: a transform operation might be faster than apply:

namesgrouped / namesgrouped.groupby(level=[0, 1]).transform('sum')

answered Nov 15 '22 15:11

IanS

Related questions
                            
                                Regex django url
                            
                                Unpack 1 variable, rest to a list
                            
                                Generate three different random numbers [duplicate]
                            
                                How to input 2 integers in one line in Python?
                            
                                Python Pandas write to sql with NaN values
                            
                                How to convert list of bytes (unicode) to Python string?
                            
                                Python loop index of key, value for-loop when using items()
                            
                                Function sequence error in PYODBC
                            
                                How to get unique values from a python list of objects
                            
                                SQLAlchemy filter query "column LIKE ANY (array)"
                            
                                CertificateError: hostname doesn't match
                            
                                force object to be `dirty` in sqlalchemy
                            
                                Python argparse, provide different arguments based on parent argument value
                            
                                Unix: Have Python script constantly running best practice?
                            
                                How can i get Certificate issuer information in python?
                            
                                Create a defaultdict with a default of zero (0) [duplicate]
                            
                                Tensorflow Tensor reshape and pad with zeros
                            
                                OpenCV Optical Flow assertion
                            
                                How to delete the very last character from every string in a list of strings
                            
                                How to join png with alpha / transparency in a frame in realtime

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use groupby in pandas to calculate a percentage / proportion total based on a criteria in another column

Tags:

python

pandas

dataframe

group-by

pivot

fuzzy_logic_77

People also ask

1 Answers

IanS

Recent Activity

Donate For Us