Pandas groupby count non-null values as percentage

Tags:

pandas

Given this dataset, I would like to count missing, NaN, values:

df = pd.DataFrame({'A' : [1, np.nan, 2 , 55, 6, np.nan, -17, np.nan],
                   'Team' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
                   'C' : [4, 14, 3 , 8, 8, 7, np.nan, 11],
                   'D' : [np.nan, np.nan, -12 , 12, 12, -12, np.nan, np.nan]})

Specifically I want to count (as a percentage) per group in the 'Team' column. I can get the raw count by this:

df.groupby('Team').count()

This will get the number of nonmissing numbers. What I would like to do is create a percentage, so instead of getting the raw number I would get it as a percentage of the total entries in each group (I don't know the size of the groups which are all uneven). I've tried using .agg(), but I can't seem to get what I want. How can I do this?

942

asked Nov 08 '17 01:11

J. Paul

1 Answers

Base on your own code add div(df.groupby('Team').size(),0)

df.groupby('Team').count().div(df.groupby('Team').size(),0)
Out[190]: 
              A         C    D
Team                          
one    0.666667  0.666667  0.0
three  0.500000  1.000000  0.5
two    0.666667  1.000000  1.0

answered Sep 18 '22 14:09

BENY

Related questions
                            
                                What is the encoding of the body of Gmail message? How to decode it?
                            
                                Difference between chain(*iterable) vs chain.from_iterable(iterable)
                            
                                "PermissionError: [Errno 13] Permission denied: '/usr/lib/python3.5/site-packages'" installing Django
                            
                                Pandas Dataframe to HTML remove index
                            
                                How to specify argument type in lambda? [duplicate]
                            
                                Error tokenizing data. C error: out of memory pandas python, large file csv
                            
                                Spark dataframe add new column with random data
                            
                                Make Sublime Text 3 the default editor for Python Files. MAC OSX: El Capitan OS
                            
                                BULK INSERT error code 3: The system cannot find the path specified
                            
                                pandas localize and convert datetime column instead of the datetimeindex
                            
                                Cannot find file setuptools-27.2.0-py3.5.egg
                            
                                Pandas Read_Excel Datetime Converter
                            
                                Pandas is faster to load CSV than SQL
                            
                                Using Wordnet Synsets from Python for Italian Language
                            
                                How to fix PlotlyRequestError?
                            
                                When and why should I use attr.Factory?
                            
                                Pass **kwargs through to inner function [duplicate]
                            
                                Django Model Method or Calculation as Field in Database
                            
                                How to count lines of code in jupyter notebook
                            
                                Correct way to format integers with fixed length and space padding

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With