Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Dataframe.describe() : Which kind of standard deviation?

Using python's Pandas library, the Dataframe.describe() function prints the standard deviation of the dataset. However, the documentation page doesn't specify whether this standard deviation is the "uncorrected" standard deviation or the "corrected" standard deviation.

Can someone tell me which one it returns?

like image 685
hlin117 Avatar asked Jan 09 '23 21:01

hlin117


1 Answers

It's the corrected sample standard deviation.
You can convince yourself of this with a simple Series and applying the formulae:

In [11]: s = pd.Series([1, 2])

In [12]: s.std()
Out[12]: 0.70710678118654757

In [13]: from math import sqrt
   ....:  sqrt(0.5)
Out[13]: 0.7071067811865476

and the formula for corrected sample standard deviation:

In [14]: sqrt(1./(len(s)-1) * ((s - s.mean()) ** 2).sum())
Out[14]: 0.7071067811865476
like image 182
Andy Hayden Avatar answered Jan 21 '23 13:01

Andy Hayden