Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

glmulti Oversized candidate set

Error message:

SYSTEM: win7/64bit/ultimate/16gb-real-ram plus virtual memory, memory.limit(32000)

  1. What does this error message mean?

    In glmulti(y = "y", data = mydf, xr = c("x1", : !Oversized candidate set.

    mydf has 3.6mm rows & 150 columns of floats

  2. What steps to take to workaround it in glmulti?
  3. Any alternatives to glmulti in R world?

R/64bit "Good Sport"

like image 753
Yu Le Avatar asked Jul 18 '13 18:07

Yu Le


1 Answers

I have encountered the same problem, here is what I have found out so far:

  1. The number of rows does not seem to be the issue. The issue is that with 150 predictors the package can't handle an exhaustive search (that is take a look and compare all possible models). From my experience your specific error message "Oversized Candidate Set", is triggered by the fact that you also allow for pairwise interactions (level=2, set level=1 to prohibit interactions). Then you will most likely run into a warning message "Too many predictors". In my (very limited) experimentation, I found that the maximum amount of models I got to work into the candidate set was about a billion models (specifically: 30 covariates equal 1,073,741,824 based on the 2^n to calculate possible combinations (n=30).). Here is the code I used to evaluate this

    out <integer(50) for(i in 2:40) out[i]<-glmulti(names(data)[1], names(data)[2:i], method="d", level=1, crit=aic, data=data)

    once the loop hits 31 covariates the candidate set returns with 0 models. 33 and later it starts returning the warning message. My "data" had about 100 variables and just around a 1000 rows, but like I said the problem is the width of the dataset not the depth.

  2. Like I said, start by eliminating the interactions, then consider using other variable reduction techniques first to get your variable number down (factor analysis/principle components or clustering). The issue with those is will lose some explainability, but keep predictive power.

  3. The glmuttil documentation compares the package with alternatives, while highlighting their use cases, benefits and downfalls.

PS: I ran my stuff on Win7, 64 bit, 16GB Ram, R version: 3.10 glmutil 1.07. PPS: The author of the package was said to release version 2.0 last year that would fix some of these issues. Read more at the source

like image 105
Phill Avatar answered Oct 18 '22 12:10

Phill