Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error in svd(x, nu = 0) : 0 extent dimensions

Tags:

r

I am trying to do PCA on data frame with 5000 columns and 30 rows

Sample <- read.table(file.choose(), header=F,sep="\t")
Sample.scaled <- data.frame(apply(Sample,2,scale))
pca.Sample <- prcomp(Sample.scaled,retx=TRUE)`

Got the error

Error in svd(x, nu = 0) : infinite or missing values in 'x'

sum(is.na(Sample))
[1] 0

sum(is.na(Sample.scaled))
[1] 90

Tried to ignore all na values by using the following

pca.Sample <- prcomp(na.omit(Sample.scaled),retx=TRUE)

Which gives the following error

Error in svd(x, nu = 0) : 0 extent dimensions

There were reports that na.action requires formula to be given and hence tried the below

pca.Sample <- prcomp(~.,center=TRUE,scale=TRUE,Sample, na.action=na.omit)

Now getting the following error

Error in prcomp.default(x, ...) :
  cannot rescale a constant/zero column to unit variance

Think that the problem might be because "One of my data columns is constant. The variance of a constant is 0, and scaling would then divide by 0, which is impossible."

But not sure on how to tackle this. Any help much appreciated ....

like image 957
Tinu Thomas Avatar asked Nov 12 '12 22:11

Tinu Thomas


2 Answers

Judging by the fact that sum(is.na(Sample.scaled)) comes out as 90, when sum(is.na(Sample)) was 0, it looks like you've got three constant columns.

Here's a randomly generated (reproducible) example, which gives the same error messages:

Sample <- matrix(rnorm(30 * 5000), 30)
Sample[, c(128, 256, 512)] <- 1

Sample <- data.frame(Sample)
Sample.scaled <- data.frame(apply(Sample, 2, scale))

> sum(is.na(Sample))
[1] 0

> sum(is.na(Sample.scaled))
[1] 90

# constant columns are "scaled" to NA.
> pca.Sample <- prcomp(Sample.scaled,retx=TRUE)
Error in svd(x, nu = 0) : infinite or missing values in 'x'

# 3 entire columns are entirely NA, so na.omit omits every row
> pca.Sample <- prcomp(na.omit(Sample.scaled),retx=TRUE)
Error in svd(x, nu = 0) : 0 extent dimensions

# can't scale the 3 constant columns
> pca.Sample <- prcomp(~.,center=TRUE,scale=TRUE,Sample, na.action=na.omit)
Error in prcomp.default(x, ...) : 
  cannot rescale a constant/zero column to unit variance

You could try something like:

Sample.scaled.2 <- data.frame(t(na.omit(t(Sample.scaled))))
pca.Sample.2 <- prcomp(Sample.scaled.2, retx=TRUE)

i.e. use na.omit on the transpose to get rid of the NA columns rather than rows.

like image 77
pete Avatar answered Nov 20 '22 22:11

pete


Negative infinity values can be replaced after a log transform as below.

log_features <- log(data_matrix[,1:8])
log_features[is.infinite(log_features)] <- -99999
like image 27
Joshua Garrison Burkhart Avatar answered Nov 20 '22 23:11

Joshua Garrison Burkhart