Pandas groupby agg std NaN

Tags:

Inputs:

df['PopEst']
    .astype('float')
    .groupby(ContinentDict)
    .agg(['size','sum','mean','std']))

Outputs:

            size            sum                mean              std
Asia          5     2.898666e+09       5.797333e+08     6.790979e+08
Australia     1     2.331602e+07       2.331602e+07              NaN
Europe        6     4.579297e+08       7.632161e+07     3.464767e+07
North America 2     3.528552e+08       1.764276e+08     1.996696e+08
South America 1     2.059153e+08       2.059153e+08              NaN

Some values in column of std turns out to be NaN if the group just have one row, but I think these values are supposed to be 0, why is that?

885

asked May 12 '18 13:05

Alex J

1 Answers

pd.DataFrame.std assumes 1 degree of freedom by default, also known as sample standard deviation. This results in NaN results for groups with one number.

numpy.std, by contrast, assumes 0 degree of freedom by default, also known as population standard deviation. This gives 0 for groups with one number.

To understand the difference between sample and population, see Bessel's correction.

Therefore, you can specify numpy.std for your calculation. Note, however, that the output will be different as the calculation is different. Here's a minimal example.

import pandas as pd, numpy as np

df = pd.DataFrame(np.random.randint(0, 9, (5, 2)))

def std(x): return np.std(x)

res = df.groupby(0)[1].agg(['size', 'sum', 'mean', std])

print(res)

   size  sum  mean       std
0                           
0     2   13   6.5       0.5
4     1    3   3.0       0.0
5     1    3   3.0       0.0
6     1    3   3.0       0.0

Alternatively, if you require 1 degree of freedom, you can use fillna to replace NaN values with 0:

res = df.groupby(0)[1].agg(['size', 'sum', 'mean', 'std']).fillna(0)

108

answered Sep 20 '22 14:09

jpp

Related questions
                            
                                "KeyError: 'Records'" in AWS S3 - Lambda trigger
                            
                                Is there any benefit to using Py_DECREF instead of Py_XDECREF for Python C Extensions?
                            
                                How to make an ipywidgets Image clickable?
                            
                                How to use pandas.cut() (or equivalent) in dask efficiently?
                            
                                How to test Apache Airflow tasks that uses XCom
                            
                                Scikit-learn Agglomerative Clustering Connectivity Matrix
                            
                                Regex - finding capital words in string
                            
                                Remove index from dataframe before converting to json with split orientation
                            
                                How to insert key-value pair into dictionary at a specified position?
                            
                                Which is the correct command to update all anaconda python packages?
                            
                                One colorbar for multiple scatter plots
                            
                                shape mismatch: indexing arrays could not be broadcast together with shapes
                            
                                Getting a list of all known classes of vgg-16 in keras
                            
                                Keras custom RMSLE metric
                            
                                Pandas - Rounding off timestamps to the nearest second
                            
                                Pyinstaller adding splash screen or visual feedback during file extraction
                            
                                How do I manually set the limits of a holoview's colorbar?
                            
                                Pytest Generate Tests Based on Arguments
                            
                                Run mypy on all Python files of a project
                            
                                How to test airflow dag in unittest?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas groupby agg std NaN

Tags:

python

pandas

std

nan

pandas-groupby

Alex J

People also ask

1 Answers

jpp

Recent Activity

Donate For Us