I am trying to understand PCA by finding practical examples online. Sadly most tutorials I have found don't really seem to show simple practical applications of PCA. After a lot of searching, I came across this
http://yatani.jp/HCIstats/PCA
It is a nice simple tutorial. I want to re-create the results in Matlab, but the tutorial is in R. I have been trying to replicate the results in Matlab, but have been so far unsuccessful; I am new to Matlab. I have created the arrays as follows:
Price = [6,7,6,5,7,6,5,6,3,1,2,5,2,3,1,2];
Software = [5,3,4,7,7,4,7,5,5,3,6,7,4,5,6,3];
Aesthetics = [3,2,4,1,5,2,2,4,6,7,6,7,5,6,5,7];
Brand = [4,2,5,3,5,3,1,4,7,5,7,6,6,5,5,7];
Then in his example, he does this
data <- data.frame(Price, Software, Aesthetics, Brand)
I did a quick search online, and this apparently converts vectors into a data table in R code. So in Matlab I did this
dataTable(:,1) = Price;
dataTable(:,2) = Software;
dataTable(:,3) = Aesthetics;
dataTable(:,4) = Brand;
Now it is the next part I am unsure of.
pca <- princomp(data, cor=TRUE)
summary(pca, loadings=TRUE)
I have tried using Matlab's PCA function
[COEFF SCORE LATENT] = princomp(dataTable)
But my results do not match the ones shown in the tutorial at all. My results are
COEFF =
-0.5958 0.3786 0.7065 -0.0511
-0.1085 0.8343 -0.5402 -0.0210
0.6053 0.2675 0.3179 -0.6789
0.5166 0.2985 0.3287 0.7321
SCORE =
-2.3362 0.0276 0.6113 0.4237
-4.3534 -2.1268 1.4228 -0.3707
-1.1057 -0.2406 1.7981 0.4979
-3.6847 0.4840 -2.1400 1.0586
-1.4218 2.9083 1.2020 -0.2952
-3.3495 -1.3726 0.5049 0.3916
-4.1126 0.1546 -2.4795 -1.0846
-1.7309 0.2951 0.9293 -0.2552
2.8169 0.5898 0.4318 0.7366
3.7976 -2.1655 -0.2402 -1.2622
3.3041 1.0454 -0.8148 0.7667
1.4969 2.9845 0.7537 -0.8187
2.3993 -1.1891 -0.3811 0.7556
1.7836 -0.0072 -0.2255 -0.7276
2.2613 -0.1977 -2.4966 0.0326
4.2350 -1.1899 1.1236 0.1509
LATENT =
9.3241
2.2117
1.8727
0.5124
Yet the results in the tutorial are
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4
Standard deviation 1.5589391 0.9804092 0.6816673 0.37925777
Proportion of Variance 0.6075727 0.2403006 0.1161676 0.03595911
Cumulative Proportion 0.6075727 0.8478733 0.9640409 1.00000000
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
Price -0.523 0.848
Software -0.177 0.977 -0.120
Aesthetics 0.597 0.134 0.295 -0.734
Brand 0.583 0.167 0.423 0.674
Could anyone please explain why my results differ so much from the tutorial. Am I using the wrong Matlab function?
Also if you are able to provide any other nice simple practical applications of PCA, would be very beneficial. Still trying to get my head around all the concepts in PCA and I like examples where I can code it and see the results myself, so I can play about with it, I find it is easier when to learn this way
Any help would be much appreciated!!
Edit: The issue is purely the scaling.
R code:
summary(princomp(data, cor = FALSE), loadings=T, cutoff = 0.01)
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
Price -0.596 -0.379 0.706 -0.051
Software -0.109 -0.834 -0.540 -0.021
Aesthetics 0.605 -0.268 0.318 -0.679
Brand 0.517 -0.298 0.329 0.732
According to the Matlab help you should use this if you want scaling:
Matlab code:
princomp(zscore(X))
From help(princomp)
(in R):
The calculation is done using eigen on the correlation or covariance matrix, as determined by cor. This is done for compatibility with the S-PLUS result. A preferred method of calculation is to use svd on x, as is done in prcomp.
Note that the default calculation uses divisor N for the covariance matrix.
In the documentation of the R function prcomp
(help(prcomp)
) you can read:
The calculation is done by a singular value decomposition of the (centered and possibly scaled) data matrix, not by using eigen on the covariance matrix. This is generally the preferred method for numerical accuracy. [...] Unlike princomp, variances are computed with the usual divisor N - 1.
The Matlab function apparently uses the svd algorithm. If I use prcom
(without scaling, i.e., not based on correlations) with the example data I get:
> prcomp(data)
Standard deviations:
[1] 3.0535362 1.4871803 1.3684570 0.7158006
Rotation:
PC1 PC2 PC3 PC4
Price -0.5957661 0.3786184 -0.7064672 0.05113761
Software -0.1085472 0.8342628 0.5401678 0.02101742
Aesthetics 0.6053008 0.2675111 -0.3179391 0.67894297
Brand 0.5166152 0.2984819 -0.3286908 -0.73210631
This is (appart from the irrelevant signs) identical to the Matlab output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With