I would like to be able to construct the scores of a principal component analysis using its loadings, but I cannot figure out what the princomp function is actually doing when it computes the scores of a dataset. A toy example:
cc <- matrix(1:24,ncol=4)
PCAcc <- princomp(cc,scores=T,cor=T)
PCAcc$loadings
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
[1,] 0.500 0.866
[2,] 0.500 -0.289 0.816
[3,] 0.500 -0.289 -0.408 -0.707
[4,] 0.500 -0.289 -0.408 0.707
PCAcc$scores
Comp.1 Comp.2 Comp.3 Comp.4
[1,] -2.92770 -6.661338e-16 -3.330669e-16 0
[2,] -1.75662 -4.440892e-16 -2.220446e-16 0
[3,] -0.58554 -1.110223e-16 -6.938894e-17 0
[4,] 0.58554 1.110223e-16 6.938894e-17 0
[5,] 1.75662 4.440892e-16 2.220446e-16 0
[6,] 2.92770 6.661338e-16 3.330669e-16 0
My understanding is that the scores are a linear combination of the loadings and the original data rescaled. Trying by "hand":
rescaled <- t(t(cc)-apply(cc,2,mean))
rescaled%*%PCAcc$loadings
Comp.1 Comp.2 Comp.3 Comp.4
[1,] -5 -1.332268e-15 -4.440892e-16 0
[2,] -3 -6.661338e-16 -3.330669e-16 0
[3,] -1 -2.220446e-16 -1.110223e-16 0
[4,] 1 2.220446e-16 1.110223e-16 0
[5,] 3 6.661338e-16 3.330669e-16 0
[6,] 5 1.332268e-15 4.440892e-16 0
The columns are off by a factor of 1.707825, 2, and 1.333333, respectively. Why is this? Since the toy data matrix has the same variance in each column, normalization shouldn't be necessary here. Any help is greatly appreciated.
Thanks!
Positive loadings indicate a variable and a principal component are positively correlated: an increase in one results in an increase in the other. Negative loadings indicate a negative correlation. Large (either positive or negative) loadings indicate that a variable has a strong effect on that principal component.
Negative correlations among variables and negative loadings do not cause any specific concerns in PCA. In the interpretation of PCA, a negative loading simply means that a certain characteristic is lacking in a latent variable associated with the given principal component.
The function princomp() uses the spectral decomposition approach. The functions prcomp() and PCA()[FactoMineR] use the singular value decomposition (SVD). According to the R help, SVD has slightly better numerical accuracy. Therefore, the function prcomp() is preferred compared to princomp().
PCA loadings are the coefficients of the linear combination of the original variables from which the principal components (PCs) are constructed.
You need
scale(cc,PCAcc$center,PCAcc$scale)%*%PCAcc$loadings
or easier
predict(PCAcc,newdata=cc)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With