I am using the function prcomp
to calculate the first two principal components. However, my data has some NA values and therefore the function throws an error. The na.action defined seems not to work even though it is mentioned in the help file ?prcomp
Here is my example:
d <- data.frame(V1 = sample(1:100, 10), V2 = sample(1:100, 10)) prcomp(d, center = TRUE, scale = TRUE, na.action = na.omit) d$V1[5] <- NA d$V2[7] <- NA prcomp(d, center = TRUE, scale = TRUE, na.action = na.omit)
I am using the newest R version 2.15.1 for Mac OS X.
Can anybody see the reason while prcomp
fails?
Here is my new example:
d <- data.frame(V1 = sample(1:100, 10), V2 = sample(1:100, 10)) result <- prcomp(d, center = TRUE, scale = TRUE, na.action = na.omit) result$x d$V1[5] <- NA result <- prcomp(~V1+V2, data=d, center = TRUE, scale = TRUE, na.action = na.omit) result$x
is it possible to retain row 5 in PC1 and PC2? In my real data set I have of course more than two columns of variables and only some of them are missing and I do not want to lose the remaining information hidden in the other values!
Input to the PCA can be any set of numerical variables, however they should be scaled to each other and traditional PCA will not accept any missing data points. Data points will be scored by how well they fit into a principal component (PC) based upon a measure of variance within the dataset.
They are different when both using covariance matrix. When scaling (normalizing) the training data, prcomp uses n−1 as denominator but princomp uses n as its denominator. Difference of these two denominators is explained in this tutorial on principal component analysis.
prcomp returns a list with class "prcomp" containing the following components: sdev. the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix). rotation.
There are two general methods to perform PCA in R : Spectral decomposition which examines the covariances / correlations between variables. Singular value decomposition which examines the covariances / correlations between individuals.
Another solution if you're not willing to use formula interface is
prcomp(na.omit(d), center = TRUE, scale = TRUE)
which consist of applying na.omit
directly to the data frame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With