I am currently working on reimplementing some algorithm written in Java in Python. One step is to calculate the standard deviation of a list of values. The original implementation uses DescriptiveStatistics.getStandardDeviation
from the Apache Math 1.1 library for this. I use the standard deviation of numpy 1.5. The problem is, they give (very) different results for the same input. The sample I have is this:
[0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842]
I get the following results:
numpy : 0.10932134388775223
Apache Math 1.1 : 0.12620366805397404
Wolfram Alpha : 0.12620366805397404
I checked with Wolfram Alpha to get a third opinion. I do not think that such a difference can be explained by precision alone. Does anyone have any idea why this is happening, and what I could do about it?
Edit: Calculating it manually in Python gives the same result:
>>> from math import sqrt
>>> v = [0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842]
>>> mu = sum(v) / 4
>>> sqrt(sum([(x - mu)**2 for x in v]) / 4)
0.10932134388775223
Also, about not using it right:
>>> from numpy import std
>>> std([0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842])
0.10932134388775223
Apache and Wolfram divide by N-1 rather than N. This is a degrees of freedom adjustment, since you estimate μ. By dividing by N-1 you obtain an unbiased estimate of the population standard deviation. You can change NumPy's behavior using the ddof
option.
This is described in the NumPy documentation:
The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With