Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do numpy cov diagonal elements and var functions have different values?

Tags:

python

numpy

In [127]: x = np.arange(2)

In [128]: np.cov(x,x)
Out[128]:
array([[ 0.5,  0.5],
       [ 0.5,  0.5]])

In [129]: x.var()
Out[129]: 0.25

Why is this the behavior? I believe that covariance matrix diagonal elements should be the variance of the series.

like image 749
zsljulius Avatar asked Jan 09 '14 20:01

zsljulius


People also ask

How does Numpy calculate covariance?

cov() function. Covariance provides the a measure of strength of correlation between two variable or more set of variables. The covariance matrix element Cij is the covariance of xi and xj.

How do you calculate cov in Python?

Covariance between 2 random variables is calculated by taking the product of the difference between the value of each random variable and its mean, summing all the products, and finally dividing it by the number of values in the dataset. In this case, the covariance of a and b is 6.35 (a positive covariance).

What is covariance matrix in python?

A covariance matrix is a square matrix that shows the covariance between many different variables. This can be a useful way to understand how different variables are related in a dataset.


1 Answers

In numpy, cov defaults to a "delta degree of freedom" of 1 while var defaults to a ddof of 0. From the notes to numpy.var

Notes
-----
The variance is the average of the squared deviations from the mean,
i.e.,  ``var = mean(abs(x - x.mean())**2)``.

The mean is normally calculated as ``x.sum() / N``, where ``N = len(x)``.
If, however, `ddof` is specified, the divisor ``N - ddof`` is used
instead.  In standard statistical practice, ``ddof=1`` provides an
unbiased estimator of the variance of a hypothetical infinite population.
``ddof=0`` provides a maximum likelihood estimate of the variance for
normally distributed variables.

So you can get them to agree by taking:

In [69]: cov(x,x)#defaulting to ddof=1
Out[69]: 
array([[ 0.5,  0.5],
       [ 0.5,  0.5]])

In [70]: x.var(ddof=1)
Out[70]: 0.5

In [71]: cov(x,x,ddof=0)
Out[71]: 
array([[ 0.25,  0.25],
       [ 0.25,  0.25]])

In [72]: x.var()#defaulting to ddof=0
Out[72]: 0.25
like image 85
mmdanziger Avatar answered Oct 26 '22 03:10

mmdanziger