Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding principal components as variables to a data frame

I am working with a dataset of 10000 data points and 100 variables in R. Unfortunately the variables I have do not describe the data in a good way. I carried out a PCA analysis using prcomp() and the first 3 PCs seem to account for a most of the variability of the data. As far as I understand, a principal component is a combination of different variables; therefore it has a certain value corresponding to each data point and can be considered as a new variable. Would I be able to add these principal components as 3 new variables to my data? I would need them for further analysis.

A reproducible dataset:

set.seed(144)
x <- data.frame(matrix(rnorm(2^10*12), ncol=12))
y <- prcomp(formula = ~., data=x, center = TRUE, scale = TRUE, na.action = na.omit)
like image 954
Error404 Avatar asked Nov 13 '13 11:11

Error404


People also ask

Are principal components variables?

Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables.

Can PCA be used for categorical variables?

While it is technically possible to use PCA on discrete variables, or categorical variables that have been one hot encoded variables, you should not. Simply put, if your variables don't belong on a coordinate plane, then do not apply PCA to them.


1 Answers

PC scores are stored in the element x of prcomp() result.

str(y)

List of 6
 $ sdev    : num [1:12] 1.08 1.06 1.05 1.04 1.03 ...
 $ rotation: num [1:12, 1:12] -0.0175 -0.1312 0.3284 -0.4134 0.2341 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:12] "X1" "X2" "X3" "X4" ...
  .. ..$ : chr [1:12] "PC1" "PC2" "PC3" "PC4" ...
 $ center  : Named num [1:12] 0.02741 -0.01692 -0.03228 -0.03303 0.00122 ...
  ..- attr(*, "names")= chr [1:12] "X1" "X2" "X3" "X4" ...
 $ scale   : Named num [1:12] 0.998 1.057 1.019 1.007 0.993 ...
  ..- attr(*, "names")= chr [1:12] "X1" "X2" "X3" "X4" ...
 $ x       : num [1:1024, 1:12] 1.023 -1.213 0.167 -0.118 -0.186 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:1024] "1" "2" "3" "4" ...
  .. ..$ : chr [1:12] "PC1" "PC2" "PC3" "PC4" ...
 $ call    : language prcomp(formula = ~., data = x, na.action = na.omit, center = TRUE, scale = TRUE)
 - attr(*, "class")= chr "prcomp"

You can get them with y$x and then chose those columns you need.

x.new<-cbind(x,y$x[,1:3])
str(x.new)

'data.frame':   1024 obs. of  15 variables:
 $ X1 : num  1.14 2.38 0.684 1.785 0.313 ...
 $ X2 : num  -0.689 0.446 -0.72 -3.511 0.36 ...
 $ X3 : num  0.722 0.816 0.295 -0.48 0.566 ...
 $ X4 : num  1.629 0.738 0.85 1.057 0.116 ...
 $ X5 : num  -0.737 -0.827 0.65 -0.496 -1.045 ...
 $ X6 : num  0.347 0.056 -0.606 1.077 0.257 ...
 $ X7 : num  -0.773 1.042 2.149 -0.599 0.516 ...
 $ X8 : num  2.05511 0.4772 0.18614 0.02585 0.00619 ...
 $ X9 : num  -0.0462 1.3784 -0.2489 0.1625 0.6137 ...
 $ X10: num  -0.709 0.755 0.463 -0.594 -1.228 ...
 $ X11: num  -1.233 -0.376 -2.646 1.094 0.207 ...
 $ X12: num  -0.44 -2.049 0.315 0.157 2.245 ...
 $ PC1: num  1.023 -1.213 0.167 -0.118 -0.186 ...
 $ PC2: num  1.2408 0.6077 1.1885 3.0789 0.0797 ...
 $ PC3: num  -0.776 -1.41 0.977 -1.343 0.987 ...
like image 147
Didzis Elferts Avatar answered Sep 21 '22 19:09

Didzis Elferts