Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy cov (covariance) function, what exactly does it compute?

I assume numpy.cov(X) computes the sample covariance matrix as:

1/(N-1) * Sum (x_i - m)(x_i - m)^T (where m is the mean)

I.e sum of outer products. But nowhere in the documentation does it actually say this, it just says "Estimate a covariance matrix".

Can anyone confirm whether this is what it does internally? (I know I can change the constant out the front with the bias parameter.)

like image 366
Flash Avatar asked Apr 17 '13 14:04

Flash


1 Answers

As you can see looking at the source, in the simplest case with no masks, and N variables with M samples each, it returns the (N, N) covariance matrix calculated as:

(x-m) * (x-m).T.conj() / (N - 1)

Where the * represents the matrix product[1]

Implemented roughly as:

X -= X.mean(axis=0)
N = X.shape[1]

fact = float(N - 1)

return dot(X, X.T.conj()) / fact

If you want to review the source, look here instead of the link from Mr E unless you're interested in masked arrays. As you mentioned, the documentation isn't great.

[1] which in this case is effectively (but not exactly) the outer product because (x-m) has N column vectors of length M and thus (x-m).T is as many row vectors. The end result is the sum of all the outer products. The same * will give the inner (scalar) product if the order is reversed. But, technically these are both just standard matrix multiplications and the true outer product is only the product of a column vector onto a row vector.

like image 157
askewchan Avatar answered Nov 15 '22 11:11

askewchan