I'm implementing PCA using eigenvalue decomposition for sparse data. I know matlab has PCA implemented, but it helps me understand all the technicalities when I write code. I've been following the guidance from here, but I'm getting different results in comparison to built-in function princomp.
Could anybody look at it and point me in the right direction.
Here's the code:
function [mu, Ev, Val ] = pca(data)
% mu - mean image
% Ev - matrix whose columns are the eigenvectors corresponding to the eigen
% values Val
% Val - eigenvalues
if nargin ~= 1
error ('usage: [mu,E,Values] = pca_q1(data)');
end
mu = mean(data)';
nimages = size(data,2);
for i = 1:nimages
data(:,i) = data(:,i)-mu(i);
end
L = data'*data;
[Ev, Vals] = eig(L);
[Ev,Vals] = sort(Ev,Vals);
% computing eigenvector of the real covariance matrix
Ev = data * Ev;
Val = diag(Vals);
Vals = Vals / (nimages - 1);
% normalize Ev to unit length
proper = 0;
for i = 1:nimages
Ev(:,i) = Ev(:,1)/norm(Ev(:,i));
if Vals(i) < 0.00001
Ev(:,i) = zeros(size(Ev,1),1);
else
proper = proper+1;
end;
end;
Ev = Ev(:,1:nimages);
Principal component analysis is a quantitatively rigorous method for achieving this simplification. The method generates a new set of variables, called principal components. Each principal component is a linear combination of the original variables.
Perform PCA on the expression data and plot the result. Select a subset of data points by dragging a box around them. The data points are highlighted and their corresponding labels appear in Selected Data. You can then export the selected data to the workspace by selecting Export.
Principal component analysis, or PCA, is a statistical procedure that allows you to summarize the information content in large data tables by means of a smaller set of “summary indices” that can be more easily visualized and analyzed.
To interpret each principal components, examine the magnitude and direction of the coefficients for the original variables. The larger the absolute value of the coefficient, the more important the corresponding variable is in calculating the component.
Here's how I would do it:
function [V newX D] = myPCA(X)
X = bsxfun(@minus, X, mean(X,1)); %# zero-center
C = (X'*X)./(size(X,1)-1); %'# cov(X)
[V D] = eig(C);
[D order] = sort(diag(D), 'descend'); %# sort cols high to low
V = V(:,order);
newX = X*V(:,1:end);
end
and an example to compare against the PRINCOMP function from the Statistics Toolbox:
load fisheriris
[V newX D] = myPCA(meas);
[PC newData Var] = princomp(meas);
You might also be interested in this related post about performing PCA by SVD.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With