Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different standard deviation for same input from Wolfram and numpy

I am currently working on reimplementing some algorithm written in Java in Python. One step is to calculate the standard deviation of a list of values. The original implementation uses DescriptiveStatistics.getStandardDeviation from the Apache Math 1.1 library for this. I use the standard deviation of numpy 1.5. The problem is, they give (very) different results for the same input. The sample I have is this:

[0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842]

I get the following results:

numpy           : 0.10932134388775223
Apache Math 1.1 : 0.12620366805397404
Wolfram Alpha   : 0.12620366805397404

I checked with Wolfram Alpha to get a third opinion. I do not think that such a difference can be explained by precision alone. Does anyone have any idea why this is happening, and what I could do about it?

Edit: Calculating it manually in Python gives the same result:

>>> from math import sqrt
>>> v = [0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842]
>>> mu = sum(v) / 4
>>> sqrt(sum([(x - mu)**2 for x in v]) / 4)
0.10932134388775223

Also, about not using it right:

>>> from numpy import std
>>> std([0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842])
0.10932134388775223
like image 688
Björn Pollex Avatar asked Jan 01 '11 20:01

Björn Pollex


1 Answers

Apache and Wolfram divide by N-1 rather than N. This is a degrees of freedom adjustment, since you estimate μ. By dividing by N-1 you obtain an unbiased estimate of the population standard deviation. You can change NumPy's behavior using the ddof option.

This is described in the NumPy documentation:

The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.

like image 192
Tristan Avatar answered Nov 01 '22 12:11

Tristan