Given this dataset, I would like to count missing, NaN, values:
df = pd.DataFrame({'A' : [1, np.nan, 2 , 55, 6, np.nan, -17, np.nan],
'Team' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
'C' : [4, 14, 3 , 8, 8, 7, np.nan, 11],
'D' : [np.nan, np.nan, -12 , 12, 12, -12, np.nan, np.nan]})
Specifically I want to count (as a percentage) per group in the 'Team' column. I can get the raw count by this:
df.groupby('Team').count()
This will get the number of nonmissing numbers. What I would like to do is create a percentage, so instead of getting the raw number I would get it as a percentage of the total entries in each group (I don't know the size of the groups which are all uneven). I've tried using .agg(), but I can't seem to get what I want. How can I do this?
You can calculate the percentage of total with the groupby of pandas DataFrame by using DataFrame. groupby() , DataFrame. agg() , DataFrame. transform() methods and DataFrame.
Here, we are going to learn how to groupby column values with NaN values, as the groupby method usually excludes the NaN values hence to include NaN values, we use groupby method with some special parameters.
Base on your own code add div(df.groupby('Team').size(),0)
df.groupby('Team').count().div(df.groupby('Team').size(),0)
Out[190]:
A C D
Team
one 0.666667 0.666667 0.0
three 0.500000 1.000000 0.5
two 0.666667 1.000000 1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With