I am trying to do PCA on data frame with 5000 columns and 30 rows
Sample <- read.table(file.choose(), header=F,sep="\t")
Sample.scaled <- data.frame(apply(Sample,2,scale))
pca.Sample <- prcomp(Sample.scaled,retx=TRUE)`
Got the error
Error in svd(x, nu = 0) : infinite or missing values in 'x'
sum(is.na(Sample))
[1] 0
sum(is.na(Sample.scaled))
[1] 90
Tried to ignore all na values by using the following
pca.Sample <- prcomp(na.omit(Sample.scaled),retx=TRUE)
Which gives the following error
Error in svd(x, nu = 0) : 0 extent dimensions
There were reports that na.action requires formula to be given and hence tried the below
pca.Sample <- prcomp(~.,center=TRUE,scale=TRUE,Sample, na.action=na.omit)
Now getting the following error
Error in prcomp.default(x, ...) :
cannot rescale a constant/zero column to unit variance
Think that the problem might be because "One of my data columns is constant. The variance of a constant is 0, and scaling would then divide by 0, which is impossible."
But not sure on how to tackle this. Any help much appreciated ....
Judging by the fact that sum(is.na(Sample.scaled))
comes out as 90
, when sum(is.na(Sample))
was 0
, it looks like you've got three constant columns.
Here's a randomly generated (reproducible) example, which gives the same error messages:
Sample <- matrix(rnorm(30 * 5000), 30)
Sample[, c(128, 256, 512)] <- 1
Sample <- data.frame(Sample)
Sample.scaled <- data.frame(apply(Sample, 2, scale))
> sum(is.na(Sample))
[1] 0
> sum(is.na(Sample.scaled))
[1] 90
# constant columns are "scaled" to NA.
> pca.Sample <- prcomp(Sample.scaled,retx=TRUE)
Error in svd(x, nu = 0) : infinite or missing values in 'x'
# 3 entire columns are entirely NA, so na.omit omits every row
> pca.Sample <- prcomp(na.omit(Sample.scaled),retx=TRUE)
Error in svd(x, nu = 0) : 0 extent dimensions
# can't scale the 3 constant columns
> pca.Sample <- prcomp(~.,center=TRUE,scale=TRUE,Sample, na.action=na.omit)
Error in prcomp.default(x, ...) :
cannot rescale a constant/zero column to unit variance
You could try something like:
Sample.scaled.2 <- data.frame(t(na.omit(t(Sample.scaled))))
pca.Sample.2 <- prcomp(Sample.scaled.2, retx=TRUE)
i.e. use na.omit
on the transpose to get rid of the NA
columns rather than rows.
Negative infinity values can be replaced after a log transform as below.
log_features <- log(data_matrix[,1:8])
log_features[is.infinite(log_features)] <- -99999
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With