Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python numpy.var returning wrong values

I'm trying to do a simple variance calculation on a set of 3 numbers:

numpy.var([0.82159889, 0.26007962, 0.09818412])

which returns

0.09609366366174843

However, when you calculate the variance it should actually be

0.1441405

Seems like such a simple thing, but I haven't been able to find an answer yet.

like image 920
pajarraco Avatar asked Oct 09 '14 02:10

pajarraco


2 Answers

As the documentation explains:

ddof : int, optional
    "Delta Degrees of Freedom": the divisor used in the calculation is
    ``N - ddof``, where ``N`` represents the number of elements. By
    default `ddof` is zero.

And so you have:

>>> numpy.var([0.82159889, 0.26007962, 0.09818412], ddof=0)
0.09609366366174843
>>> numpy.var([0.82159889, 0.26007962, 0.09818412], ddof=1)
0.14414049549262264

Both conventions are common enough that you always need to check which one is being used by whatever package you're using, in any language.

like image 99
DSM Avatar answered Oct 12 '22 22:10

DSM


np.var by default calculates the population variance.

The Sum of Squared Errors can be calculated as follows:

>>> vals = [0.82159889, 0.26007962, 0.09818412]
>>> mean = sum(vals)/3.0
>>> mean
0.3932875433333333
>>> sum((mean-val)**2 for val in vals)
0.2882809909852453
>>> sse = sum((mean-val)**2 for val in vals)

This is the population variance:

>>> sse/3 
0.09609366366174843
>>> np.var(vals)
0.09609366366174843

This is the sample variance:

>>> sse/(3-1)
0.14414049549262264
>>> np.var(vals, ddof=1)
0.14414049549262264

You can read more about the difference here.

like image 44
Russia Must Remove Putin Avatar answered Oct 12 '22 23:10

Russia Must Remove Putin