I'm quite new to R and I'm trying to use the ComBat script in the R sva library on a 331 x 89 matrix of gene expression values. My data consists of 5 batches and it's ordered in that way so the first 106 rows correspond to batch 1, the next 106 correspond to batch 2 and so on.
batch1 <- rep(1,times=106)
batch2 <- rep(2,times=106)
batch3 <- rep(3,times=39)
batch4 <- rep(4,times=26)
batch5 <- rep(5,times=54)
batch.type <- as.factor(c(batch1,batch2,batch3,batch4,batch5))
Then I try to use ComBat using this command:
ComBat(data,batch=batch.type,mod=NULL)
And I get the following readout and error message:
"Found 5 batches
Found 0 categorical covariate(s)
Standardizing Data across genes
Error in solve(t(design) %*% design) %*% t(design) %*% t(as.matrix(dat)) :
non-conformable arguments"
I've used ComBat in both the sva package (just this morning!) the InSilicoDb package ~ a year ago, and have gotten similar errors using the ComBat method in both packages. While there are other threads in a similar vein, the error messages can be different. I've encountered the "Error in solve(t(design) %% design) %% t(design) %*% t(as.matrix(dat)) : non-conformable arguments", and "Error in while (change > conv) { : missing value where TRUE/FALSE needed", in both cases when the variance across all my samples is not high enough.
The latter error is a bit more helpful: you see intuit from "change" not being big enough that something's going on here with variance. I often use a script I wrote to filter out very low-varying genes. Coefficient of variance is probably a better (standardized) metric, but if I threshold rather aggressively, to remove all genes below a variance of 1, ComBat then reliably runs (I just tried a lower threshold of .5 and it gave the error message you found. Probably a bit data-dependent also.)
Oddly enough, a stricter variance threshold has solved both of those error messages for me. I think it must do with how the code goes through iterating until convergence; perhaps low-varying genes can cause it to stop at a few points in the program?
A thread on the "change > conv" message is below. But removing only constant (variance =0) genes has never been sufficient enough for me to run ComBat like this user. His/her dataset must have had either variance = 0 or variance = a large number across all samples:
https://groups.google.com/forum/#!msg/combat-user-forum/_z8DxYQNFJ8/7UI_a2nCoUEJ
I don't love filtering out samples just because they're low-varying, but I haven't found a good method that can distinguish these from noise anyways.
Let me know if this helps - would be great to know if it's 100% due to low-varying genes (or whatever feature you have) across samples!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With