Let's say I have a data matrix d <pre class="prettyprint"><code>pc = prcomp(d) # pc1 and pc2 are the principal components pc1 = pc$rotation[,1] pc2 = pc$rotation[,2] </code></pre> Then this should fit the linear regression model right? <pre class="prettyprint"><code>r = lm(y ~ pc1+pc2) </code></pre> But then I get this error : <pre class="prettyprint"><code>Errormodel.frame.default(formula = y ~ pc1+pc2, drop.unused.levels = TRUE) : unequal dimensions('pc1') </code></pre> I guess there a packages out there who do this automatically, but this should work too?

Answer: you don't want pc$rotation, it's the rotation matrix and not the matrix of rotated values (scores). Make up some data: <pre class="prettyprint"><code>x1 = runif(100) x2 = runif(100) y = rnorm(2+3*x1+4*x2) d = cbind(x1,x2) pc = prcomp(d) dim(pc$rotation) ## [1] 2 2 </code></pre> Oops. The "x" component is what we want. From ?prcomp: <blockquote> x: if ‘retx’ is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the ‘rotation' matrix) is returned. </blockquote> <pre class="prettyprint"><code>dim(pc$x) ## [1] 100 2 lm(y~pc$x[,1]+pc$x[,2]) ## ## Call: ## lm(formula = y ~ pc$x[, 1] + pc$x[, 2]) ## Coefficients: ## (Intercept) pc$x[, 1] pc$x[, 2] ## 0.04942 0.14272 -0.13557 </code></pre>

How to fit a linear regression model with two principal components in R?

Tags:

r

linear-regression

pca

Let's say I have a data matrix d

pc = prcomp(d)

# pc1 and pc2 are the principal components  
pc1 = pc$rotation[,1] 
pc2 = pc$rotation[,2]

Then this should fit the linear regression model right?

r = lm(y ~ pc1+pc2)

But then I get this error :

Errormodel.frame.default(formula = y ~ pc1+pc2, drop.unused.levels = TRUE) : 
   unequal dimensions('pc1')

I guess there a packages out there who do this automatically, but this should work too?

764

asked Nov 26 '09 18:11

phpdash

1 Answers

Answer: you don't want pc$rotation, it's the rotation matrix and not the matrix of rotated values (scores).

Make up some data:

x1 = runif(100)
x2 = runif(100)
y = rnorm(2+3*x1+4*x2)
d = cbind(x1,x2)

pc = prcomp(d)
dim(pc$rotation)
## [1] 2 2

Oops. The "x" component is what we want. From ?prcomp:

x: if ‘retx’ is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the ‘rotation' matrix) is returned.

dim(pc$x)
## [1] 100   2
lm(y~pc$x[,1]+pc$x[,2])
## 
## Call:
## lm(formula = y ~ pc$x[, 1] + pc$x[, 2])

## Coefficients:
## (Intercept)    pc$x[, 1]    pc$x[, 2]  
##     0.04942      0.14272     -0.13557

198

answered Oct 14 '22 15:10

Ben Bolker

Related questions
                            
                                How to specify split in a decision tree in R programming?
                            
                                Connecting R to postgreSQL database
                            
                                Converting array to matrix in R
                            
                                Caret package - cross-validating GAM with both smooth and linear predictors
                            
                                Count repetitions of a set of characters
                            
                                Wordcloud with a specific shape [closed]
                            
                                How to just calculate the diagonal of a matrix product in R
                            
                                dplyr mutate replace value(s) in a single column based on condition(s) in an efficient way
                            
                                How to include image on left side and text on the right side for rmarkdown presentation
                            
                                Grouping and summarizing by keeping other columns in R
                            
                                R - convert a string of repeat amino acids from e.g. NNNN to (N4)
                            
                                Flexdashboard multiple attributes page
                            
                                Why not use a for loop?
                            
                                dplyr lag of different group
                            
                                Separate a column into 2 columns at the last underscore in R
                            
                                How to replace multiple values at once [duplicate]
                            
                                Why does '->'(12, b) give an error?
                            
                                What does < stand for in data.table joins with on=
                            
                                Combining big data files with different columns into one big file
                            
                                Add color gradient to ridgelines according to height

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With