Calculating Covariance Matrix in Matlab

I am implementing a PCA algorithm in MATLAB. I see two different approaches to calculating the covariance matrix:

C = sampleMat.' * sampleMat ./ nSamples;


C = cov(data);

What is the difference between these two methods?

PS 1: When I use cov(data) is that unnecessary:

meanSample = mean(data,1);
data = data - repmat(data, nSamples, 1);

PS 2:

At first approach should I use nSamples or nSamples - 1?

1 Answers

In short: cov mainly just adds convenience to the bare formula.

If you type

edit cov

You'll see a lot of stuff, with these lines all the way at the bottom:

xc = bsxfun(@minus,x,sum(x,1)/m);  % Remove mean    
if flag
    xy = (xc' * xc) / m;
    xy = (xc' * xc) / (m-1);  % DEFAULT 

which is essentially the same as your first line, save for the subtraction of the column-means.

Read the wiki on sample covariances to see why there is a minus-one in the default path.

Note however that your first line uses normal transpose (.'), whereas the cov-version uses conjugate-transpose ('). This will make the output of cov different in the context of complex-valued data.

Also note that cov is a function call to a non-built in function. That means that there will be a (possibly severe) performance penalty when using cov in a loop; Matlab's JIT compiler cannot accelerate non-built in functions.

