Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R function prcomp fails with NA's values even though NA's are allowed

Tags:

r

na

pca

I am using the function prcomp to calculate the first two principal components. However, my data has some NA values and therefore the function throws an error. The na.action defined seems not to work even though it is mentioned in the help file ?prcomp

Here is my example:

d <- data.frame(V1 = sample(1:100, 10), V2 = sample(1:100, 10))  prcomp(d, center = TRUE, scale = TRUE, na.action = na.omit)  d$V1[5] <- NA d$V2[7] <- NA  prcomp(d, center = TRUE, scale = TRUE, na.action = na.omit) 

I am using the newest R version 2.15.1 for Mac OS X.

Can anybody see the reason while prcomp fails?

Here is my new example:

d <- data.frame(V1 = sample(1:100, 10), V2 = sample(1:100, 10))  result <- prcomp(d, center = TRUE, scale = TRUE, na.action = na.omit)  result$x  d$V1[5] <- NA  result <- prcomp(~V1+V2, data=d, center = TRUE, scale = TRUE, na.action = na.omit)  result$x 

is it possible to retain row 5 in PC1 and PC2? In my real data set I have of course more than two columns of variables and only some of them are missing and I do not want to lose the remaining information hidden in the other values!

like image 363
user969113 Avatar asked Aug 22 '12 17:08

user969113


People also ask

Can you do PCA with missing values?

Input to the PCA can be any set of numerical variables, however they should be scaled to each other and traditional PCA will not accept any missing data points. Data points will be scored by how well they fit into a principal component (PC) based upon a measure of variance within the dataset.

What is the difference between Prcomp and Princomp in R?

They are different when both using covariance matrix. When scaling (normalizing) the training data, prcomp uses n−1 as denominator but princomp uses n as its denominator. Difference of these two denominators is explained in this tutorial on principal component analysis.

What does Prcomp mean in R?

prcomp returns a list with class "prcomp" containing the following components: sdev. the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix). rotation.

How do you carry out PCA in R?

There are two general methods to perform PCA in R : Spectral decomposition which examines the covariances / correlations between variables. Singular value decomposition which examines the covariances / correlations between individuals.


1 Answers

Another solution if you're not willing to use formula interface is

prcomp(na.omit(d), center = TRUE, scale = TRUE) 

which consist of applying na.omit directly to the data frame.

like image 114
Jilber Urbina Avatar answered Oct 09 '22 04:10

Jilber Urbina