I am implementing a PCA algorithm in MATLAB. I see two different approaches to calculating the covariance matrix: <pre class="prettyprint"><code>C = sampleMat.' * sampleMat ./ nSamples; </code></pre> and <pre class="prettyprint"><code>C = cov(data); </code></pre> What is the difference between these two methods? PS 1: When I use <code>cov(data)</code> is that unnecessary: <pre class="prettyprint"><code>meanSample = mean(data,1); data = data - repmat(data, nSamples, 1); </code></pre> PS 2: At first approach should I use <code>nSamples</code> or <code>nSamples - 1</code>?

In short: <code>cov</code> mainly just adds convenience to the bare formula. If you type <pre class="prettyprint"><code>edit cov </code></pre> You'll see a lot of stuff, with these lines all the way at the bottom: <pre class="prettyprint"><code>xc = bsxfun(@minus,x,sum(x,1)/m); % Remove mean if flag xy = (xc' * xc) / m; else xy = (xc' * xc) / (m-1); % DEFAULT end </code></pre> which is essentially the same as your first line, save for the subtraction of the column-means. Read the wiki on sample covariances to see why there is a minus-one in the default path. Note however that your first line uses normal transpose (<code>.'</code>), whereas the <code>cov</code>-version uses conjugate-transpose (<code>'</code>). This will make the output of <code>cov</code> different in the context of complex-valued data. Also note that <code>cov</code> is a function call to a non-built in function. That means that there will be a (possibly severe) performance penalty when using <code>cov</code> in a loop; Matlab's JIT compiler cannot accelerate non-built in functions.

Calculating Covariance Matrix in Matlab

Tags:

matlab

covariance

I am implementing a PCA algorithm in MATLAB. I see two different approaches to calculating the covariance matrix:

C = sampleMat.' * sampleMat ./ nSamples;

and

C = cov(data);

What is the difference between these two methods?

PS 1: When I use cov(data) is that unnecessary:

meanSample = mean(data,1);
data = data - repmat(data, nSamples, 1);

PS 2:

At first approach should I use nSamples or nSamples - 1?

595

asked Dec 04 '12 12:12

kamaci

1 Answers

In short: cov mainly just adds convenience to the bare formula.

If you type

edit cov

You'll see a lot of stuff, with these lines all the way at the bottom:

xc = bsxfun(@minus,x,sum(x,1)/m);  % Remove mean    
if flag
    xy = (xc' * xc) / m;
else
    xy = (xc' * xc) / (m-1);  % DEFAULT 
end

which is essentially the same as your first line, save for the subtraction of the column-means.

Read the wiki on sample covariances to see why there is a minus-one in the default path.

Note however that your first line uses normal transpose (.'), whereas the cov-version uses conjugate-transpose ('). This will make the output of cov different in the context of complex-valued data.

Also note that cov is a function call to a non-built in function. That means that there will be a (possibly severe) performance penalty when using cov in a loop; Matlab's JIT compiler cannot accelerate non-built in functions.

answered Oct 04 '22 12:10

Rody Oldenhuis

Related questions
                            
                                What is the usual way in MATLAB to read help page by page?
                            
                                How to add arrows to line plots in Matlab?
                            
                                Violin Plot Matlab [closed]
                            
                                Stochastic gradient Descent implementation - MATLAB
                            
                                Slow anonymous function
                            
                                How can I mimic a user click to invoke a callback function for a GUI object?
                            
                                Creating sparse matrix in MEX
                            
                                Save Kinect depth image in Matlab?
                            
                                Creating and manipulating three dimensional matrices in Matlab
                            
                                How do I hide "MATLAB Command Window" when I run an m-file from command line?
                            
                                Finding the area of a 2-D data set
                            
                                How to plot 2D data with different colors and markers
                            
                                Break loop after designated length of time in Matlab
                            
                                read .yml files in matlab
                            
                                Creating an m by n matrix of 0s and 1s from m-sized vector of column indexes
                            
                                Warning: "Parethesize the multiplication of 'D' and its transpose to ensure the result is Hermetian."
                            
                                Linear regression line in MATLAB scatter plot
                            
                                Matlab: Init variable if undefined
                            
                                Matlab/Octave/Numpy numeric difference
                            
                                optimising matlab for loop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With