Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

prcomp and ggbiplot: invalid 'rot' value

I'm trying to do a PCA analysis of my data using R, and I found this nice guide, using prcomp and ggbiplot. My data is two sample types with three biological replicates each (i.e. 6 rows) and around 20000 genes (i.e. variables). First, getting the PCA model with the code described in the guide doesn't work:

>pca=prcomp(data,center=T,scale.=T)
Error in prcomp.default(data, center = T, scale. = T) : 
cannot rescale a constant/zero column to unit variance

However, if I remove the scale. = T part, it works just fine and I get a model. Why is this, and is this the cause of the error below?

> summary(pca)
Importance of components:
                             PC1       PC2       PC3       PC4       PC5
Standard deviation     4662.8657 3570.7164 2717.8351 1419.3137 819.15844
Proportion of Variance    0.4879    0.2861    0.1658    0.0452   0.01506
Cumulative Proportion     0.4879    0.7740    0.9397    0.9849   1.00000

Secondly, plotting the PCA. Even just using the basic code, I get an error and an empty plot image:

> ggbiplot(pca)
Error: invalid 'rot' value

What does this mean, and how can I fix it? Does it have something to do with the (non)scale in making the PCA, or is it something different? It must be something with my data, I think, since if I use a standard example code (below) I get a really nice PCA plot.

> data(wine)
> wine.pca=prcomp(wine,scale.=T)
> print(ggbiplot(wine.pca, obs.scale = 1, var.scale = 1, groups = wine.class, 
  ellipse = TRUE, circle = TRUE))

[EDIT 1] I have tried subsetting my data in two ways: 1) remove all columns were all rows are 0, and 2) remove all columns were any rows are 0. The first subsetting still gives me the scale error, but not the ones that have removed columns with any 0's. Why is this? How does this affect my PCA?

Also, I tried doing using the normal biplot command for both the original data (non-scaled) and the subsetted data above, and it works in both cases. So it's something to do with with ggbiplot?

[EDIT 2] I have uploaded a subset of my data that gives me the error when I don't remove all the zeroes and works when I do. I haven't used gist before, but I think this is it. Or this...

like image 401
erikfas Avatar asked Nov 19 '14 12:11

erikfas


1 Answers

After transposing your data, I was able to replicate your error. The first error is the primary problem. PCA seeks to maximize the variance of each component so it is important that it doesn't focus on just one variable that may have very high variance. The first error:

Error in prcomp.default(tdf, center = T, scale. = T) : 
  cannot rescale a constant/zero column to unit variance

This is telling you that some of your variables have zero variance (i.e. no variability). Seeing how PCA is trying to group things by maximizing variance there is no point in retaining these variables. They can easily be removed with the following call:

df_f <- data[,apply(data, 2, var, na.rm=TRUE) != 0]

Once you do this filter, the remaining calls work appropriately

pca=prcomp(df_f,center=T,scale.=T)
ggbiplot(pca)
like image 111
cdeterman Avatar answered Nov 05 '22 23:11

cdeterman