I would like to be able to construct the scores of a principal component analysis using its loadings, but I cannot figure out what the princomp function is actually doing when it computes the scores of a dataset. A toy example: <pre class="prettyprint"><code>cc <- matrix(1:24,ncol=4) PCAcc <- princomp(cc,scores=T,cor=T) PCAcc$loadings Loadings: Comp.1 Comp.2 Comp.3 Comp.4 [1,] 0.500 0.866 [2,] 0.500 -0.289 0.816 [3,] 0.500 -0.289 -0.408 -0.707 [4,] 0.500 -0.289 -0.408 0.707 PCAcc$scores Comp.1 Comp.2 Comp.3 Comp.4 [1,] -2.92770 -6.661338e-16 -3.330669e-16 0 [2,] -1.75662 -4.440892e-16 -2.220446e-16 0 [3,] -0.58554 -1.110223e-16 -6.938894e-17 0 [4,] 0.58554 1.110223e-16 6.938894e-17 0 [5,] 1.75662 4.440892e-16 2.220446e-16 0 [6,] 2.92770 6.661338e-16 3.330669e-16 0 </code></pre> My understanding is that the scores are a linear combination of the loadings and the original data rescaled. Trying by "hand": <pre class="prettyprint"><code>rescaled <- t(t(cc)-apply(cc,2,mean)) rescaled%*%PCAcc$loadings Comp.1 Comp.2 Comp.3 Comp.4 [1,] -5 -1.332268e-15 -4.440892e-16 0 [2,] -3 -6.661338e-16 -3.330669e-16 0 [3,] -1 -2.220446e-16 -1.110223e-16 0 [4,] 1 2.220446e-16 1.110223e-16 0 [5,] 3 6.661338e-16 3.330669e-16 0 [6,] 5 1.332268e-15 4.440892e-16 0 </code></pre> The columns are off by a factor of 1.707825, 2, and 1.333333, respectively. Why is this? Since the toy data matrix has the same variance in each column, normalization shouldn't be necessary here. Any help is greatly appreciated. Thanks!

You need <pre class="prettyprint"><code>scale(cc,PCAcc$center,PCAcc$scale)%*%PCAcc$loadings </code></pre> or easier <pre class="prettyprint"><code>predict(PCAcc,newdata=cc) </code></pre>

Constructing scores from princomp loadings in R

Tags:

r

pca

princomp

I would like to be able to construct the scores of a principal component analysis using its loadings, but I cannot figure out what the princomp function is actually doing when it computes the scores of a dataset. A toy example:

cc <- matrix(1:24,ncol=4)
PCAcc <- princomp(cc,scores=T,cor=T)
PCAcc$loadings

Loadings:
     Comp.1 Comp.2 Comp.3 Comp.4
[1,]  0.500  0.866              
[2,]  0.500 -0.289  0.816       
[3,]  0.500 -0.289 -0.408 -0.707
[4,]  0.500 -0.289 -0.408  0.707

PCAcc$scores

       Comp.1        Comp.2        Comp.3 Comp.4
[1,] -2.92770 -6.661338e-16 -3.330669e-16      0
[2,] -1.75662 -4.440892e-16 -2.220446e-16      0
[3,] -0.58554 -1.110223e-16 -6.938894e-17      0
[4,]  0.58554  1.110223e-16  6.938894e-17      0
[5,]  1.75662  4.440892e-16  2.220446e-16      0
[6,]  2.92770  6.661338e-16  3.330669e-16      0

My understanding is that the scores are a linear combination of the loadings and the original data rescaled. Trying by "hand":

rescaled <- t(t(cc)-apply(cc,2,mean))
rescaled%*%PCAcc$loadings

     Comp.1        Comp.2        Comp.3 Comp.4
[1,]     -5 -1.332268e-15 -4.440892e-16      0
[2,]     -3 -6.661338e-16 -3.330669e-16      0
[3,]     -1 -2.220446e-16 -1.110223e-16      0
[4,]      1  2.220446e-16  1.110223e-16      0
[5,]      3  6.661338e-16  3.330669e-16      0
[6,]      5  1.332268e-15  4.440892e-16      0

The columns are off by a factor of 1.707825, 2, and 1.333333, respectively. Why is this? Since the toy data matrix has the same variance in each column, normalization shouldn't be necessary here. Any help is greatly appreciated.

Thanks!

260

asked Jun 01 '13 06:06

Escotch

1 Answers

You need

scale(cc,PCAcc$center,PCAcc$scale)%*%PCAcc$loadings

or easier

predict(PCAcc,newdata=cc)

answered Sep 23 '22 22:09

Ian Fellows

Related questions
                            
                                Exporting an environment from an R package
                            
                                AWS dynamodb support for "R" programming language
                            
                                Piece-wise linear and non-linear regression in R
                            
                                Convolution for Digital Signal Processing in R
                            
                                Basic R guide: verbatim ? with knitr in R
                            
                                Files in Collate field missing from package when installing from Github
                            
                                R vs Pentaho Spoon as an ETL tool [closed]
                            
                                Implementation of logistic regression formula in R
                            
                                Showing POSIXt object with Shiny renderTable
                            
                                Formatting and manipulating a plot from the R package "hexbin"
                            
                                Convert markdown to Rd, or define custom markdown conversion rules?
                            
                                Is there a way to share a lock (e.g. a lock file) between R processes?
                            
                                Creating a sequence object from SPELL data
                            
                                How to “flatten” or “collapse” a 2D data frame into a 1D data frame in R?
                            
                                CVX-esque convex optimization in R?
                            
                                retrieve original version of package function even if over-assigned
                            
                                How do I jitter the node split strings in plotting ctree output from partykit?
                            
                                Using R to interpret a symbolic formula for outside use
                            
                                R: XPath expression returns links outside of selected element
                            
                                R - Importing ASCII data using a .sas dictionary file and SAScii

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With