Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using the glmulti package in R for exhaustive search multiple regression for akaike weights

Tags:

r

regression

I was wondering if someone could help me understand why I am getting an error message when I enter a script into R. For abit of background information I am looking into the effect 6 different variables (which I think is 63 combinations or models) (X) have on gross primary and net ecosystem production (Y) seperatly at different spatial scales for my environmental science honours project. I have decided to use exhaustive search multiple regression analysis with akaikes information criterion (AIC) to try and find a group of models for best fit. (and hierarchical partitioning to compare variance attributed to different X variables) I want to get the weights so I can rank which models "best meet" the criterion see if there is one or a group of them that outfit the rest and therefore be a more likely fit to the data.

I recently posted a similar question on the hier.part package on Cross Validated received a great answer and was told to come here if I had any similar questions in the future.

The package I am using for R is glmulti. which can be found here

The script i am using is this

require(glmulti)
GPPANDDRIVER<-read.table("C:\\Databases at different scales for R\\River Rhine and Netherlands\\GPP and drivers rhineland (comma delimited).csv",header=T,sep=",")
GPP<-GPPANDDRIVER$GPP
IND_VARS<-subset(GPPANDDRIVER,select=-GPP)
#  glmulti S4 generic 
glmulti(y=GPP, xr=IND_VARS, data, exclude = c(), name = "glmulti.analysis", intercept = TRUE, marginality = FALSE, bunch=30, chunk = 1, chunks = 1,
level = 2, minsize = 0, maxsize = -1, minK = 0, maxK = -1, method = "h", crit = "aic", confsetsize = 63, popsize = 40, mutrate = 10^-3, sexrate = 0.1, imm = 0.3, plotty = TRUE, report = TRUE, deltaM = 0.05, deltaB = 0.05, conseq = 5, fitfunction = "glm", resumefile = "id", includeobjects=TRUE,)

Here is the link for the .csv data for sites in the rhineland mentioned in the example, http://www.filedropper.com/gppanddriversrhinelandcommadelimited

I am extremely new to R so I presumed popsize means the number of replicates which is 40 for this scale so I used 40, I also assumed confsetsize meant number of possible models which I believe is 63 due to the 6 variables?

If anyone could help it would be greatly appreciated

Thanks for you patience and apologies for the basic question

Richard

edit I just tried running the script this morning and it now crashes R.

like image 578
Richard N. Belcher Avatar asked Apr 16 '12 22:04

Richard N. Belcher


1 Answers

This worked for me. I think the main thing is not to blindly include all the parameters in the model call. Most of these have default values, thus (if the package writer has done their job) you should be able to leave them as they are and not worry too much (although of course you should RTFM and (try to) understand what they mean ...)

dat <- read.csv("GPPdriversRhineland.csv")
library(glmulti)

I decided to rename the predictors with shorter tags:

prednames <- c("NDVI","solar.rad","avg.temp","precip",
                "nutr.avail","water.cap")
names(dat)[1:6] <- prednames

This is all you need to fit all combinations of main effects: since you have six predictors, there are 64 level-1 models (including the null model).

g1 <- glmulti("GPP",xr=prednames,data=dat,level=1)

For a bigger computational challenge:

g2 <- glmulti("GPP",xr=prednames,data=dat,level=2)

I believe there are 2^(choose(6,2)+6) = 2.1 million possible models here. I haven't looked at ?glmulti closely enough to tell it how to stop fitting models. I just started it off (so far it has evaluated 66,000 models), but it has found a 2-level model with AIC about 500.5, which is much better than the min-AIC of 518 in the set of 1-level models ...

PS I played around with settings a bit more, trying the genetic algorithm approach rather than the exhaustive approach (I don't see an obvious way to tell glmulti "use the exhaustive approach, but stop after N tries"). Even with slightly more permissive-than-default genetic algorithm settings, it seems to get stuck at AIC approx 504, above the value found in the (partial) exhaustive screening I tried first.

e.g.:

g2 <- glmulti("GPP",xr=prednames,data=dat,level=2,marginality=TRUE,
              method="g",conseq=25,popsize=500,mutrate=1e-2)

PPS: the reason I was getting better results in the exhaustive case was that I had marginality=FALSE, i.e. the model was allowed to leave out main-effect parameters that were involved in interactions included in the model. This isn't necessarily sensible. If I turn off the marginality constraint, then the genetic algorithm can get down to AIC=499 without too much trouble ...

glmulti("GPP",xr=prednames,data=dat,level=2,marginality=TRUE,
              method="d")

is also useful: it prints out the number of candidate models defined for a given specification.

like image 190
Ben Bolker Avatar answered Sep 18 '22 16:09

Ben Bolker