Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby count non-null values as percentage

Tags:

python

pandas

Given this dataset, I would like to count missing, NaN, values:

df = pd.DataFrame({'A' : [1, np.nan, 2 , 55, 6, np.nan, -17, np.nan],
                   'Team' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
                   'C' : [4, 14, 3 , 8, 8, 7, np.nan, 11],
                   'D' : [np.nan, np.nan, -12 , 12, 12, -12, np.nan, np.nan]})

Specifically I want to count (as a percentage) per group in the 'Team' column. I can get the raw count by this:

df.groupby('Team').count()

This will get the number of nonmissing numbers. What I would like to do is create a percentage, so instead of getting the raw number I would get it as a percentage of the total entries in each group (I don't know the size of the groups which are all uneven). I've tried using .agg(), but I can't seem to get what I want. How can I do this?

like image 942
J. Paul Avatar asked Nov 08 '17 01:11

J. Paul


People also ask

How do you calculate percentage in Groupby pandas?

You can calculate the percentage of total with the groupby of pandas DataFrame by using DataFrame. groupby() , DataFrame. agg() , DataFrame. transform() methods and DataFrame.

Does pandas Groupby ignore NaN?

Here, we are going to learn how to groupby column values with NaN values, as the groupby method usually excludes the NaN values hence to include NaN values, we use groupby method with some special parameters.


1 Answers

Base on your own code add div(df.groupby('Team').size(),0)

df.groupby('Team').count().div(df.groupby('Team').size(),0)
Out[190]: 
              A         C    D
Team                          
one    0.666667  0.666667  0.0
three  0.500000  1.000000  0.5
two    0.666667  1.000000  1.0
like image 53
BENY Avatar answered Sep 18 '22 14:09

BENY