Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error when using ComBat

Tags:

r

bioconductor

I'm quite new to R and I'm trying to use the ComBat script in the R sva library on a 331 x 89 matrix of gene expression values. My data consists of 5 batches and it's ordered in that way so the first 106 rows correspond to batch 1, the next 106 correspond to batch 2 and so on.

 batch1 <- rep(1,times=106)
 batch2 <- rep(2,times=106)
 batch3 <- rep(3,times=39)
 batch4 <- rep(4,times=26)
 batch5 <- rep(5,times=54)
 batch.type <- as.factor(c(batch1,batch2,batch3,batch4,batch5)) 

Then I try to use ComBat using this command:

 ComBat(data,batch=batch.type,mod=NULL)

And I get the following readout and error message:

"Found 5 batches
Found 0  categorical covariate(s)
Standardizing Data across genes
Error in solve(t(design) %*% design) %*% t(design) %*% t(as.matrix(dat)) : 
  non-conformable arguments"
like image 408
user2846211 Avatar asked Feb 03 '14 16:02

user2846211


1 Answers

I've used ComBat in both the sva package (just this morning!) the InSilicoDb package ~ a year ago, and have gotten similar errors using the ComBat method in both packages. While there are other threads in a similar vein, the error messages can be different. I've encountered the "Error in solve(t(design) %% design) %% t(design) %*% t(as.matrix(dat)) : non-conformable arguments", and "Error in while (change > conv) { : missing value where TRUE/FALSE needed", in both cases when the variance across all my samples is not high enough.

The latter error is a bit more helpful: you see intuit from "change" not being big enough that something's going on here with variance. I often use a script I wrote to filter out very low-varying genes. Coefficient of variance is probably a better (standardized) metric, but if I threshold rather aggressively, to remove all genes below a variance of 1, ComBat then reliably runs (I just tried a lower threshold of .5 and it gave the error message you found. Probably a bit data-dependent also.)

Oddly enough, a stricter variance threshold has solved both of those error messages for me. I think it must do with how the code goes through iterating until convergence; perhaps low-varying genes can cause it to stop at a few points in the program?

A thread on the "change > conv" message is below. But removing only constant (variance =0) genes has never been sufficient enough for me to run ComBat like this user. His/her dataset must have had either variance = 0 or variance = a large number across all samples:

https://groups.google.com/forum/#!msg/combat-user-forum/_z8DxYQNFJ8/7UI_a2nCoUEJ

I don't love filtering out samples just because they're low-varying, but I haven't found a good method that can distinguish these from noise anyways.

Let me know if this helps - would be great to know if it's 100% due to low-varying genes (or whatever feature you have) across samples!

like image 136
kplaney Avatar answered Sep 22 '22 17:09

kplaney