Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to obtain a LM object from regsubsets

Tags:

r

statistics

glm

Let's imagine we want to model the US State Public-School Expenditures (education) using income, young, urban and region as regressors. For more info: ?Anscombe Model: education ~ (income+young+urban)*region

library(car)
library(leaps)

#Loading Data
data(Anscombe)
data(state)
stateinfo <- data.frame(region=state.region,row.names=state.abb)
datamodel <- data.frame(merge(stateinfo,Anscombe,by="row.names"),row.names=1)
head(datamodel)
   region education income young urban
AK   West       372   4146 439.7   484
AL  South       112   2337 362.2   584
AR  South       134   2322 351.9   500
AZ   West       207   3027 387.5   796
CA   West       273   3968 348.4   909
CO   West       192   3340 358.1   785

#Saturated Model
MOD1 <- lm(education~(.-region)*region,datamodel)
summary(MOD1)
#anova(MOD1)

#Look for the "best" model
MOD1.subset <- regsubsets(education~(.-region)*region,datamodel,nvmax=15)
plot(MOD1.subset) 

The model with 3 variables and 1 interaction (education~income+young+urban+RegionWest:young) seems to be the best in terms of BIC.

coef(MOD1.subset,4)

The questions is, how do I obtain the ML object from that model without writing the formula by hand?

Before posting, I found the package HH which has some interesting functions for regsubsets objects such as summaryHH and lm.regsubsets.

library(HH)
summaryHH(MOD1.subset)[4,]
lm.regsubsets(MOD1.subset,4)

lm.regsubsets does what I want but I think that has some problems parsing interactions, any alternatives?

like image 277
Emer Avatar asked Oct 25 '12 07:10

Emer


Video Answer


1 Answers

I don't think this is going to be possible, at least not without a lot of processing of the names of the coefficients. I got ~95% of the way there but fell down on the interaction term:

coefs <- coef(MOD1s, 4)
nams <- names(coefs)
nams <- nams[!nams %in% "(Intercept)"]
response <- as.character(as.formula(MOD1s$call[[2]])[[2]])
form <- as.formula(paste(response, paste(nams, collapse = " + "), sep = " ~ "))
df <- get(as.character(MOD1s$call[[3]]))
mod <- lm(form, data = df)

> mod <- lm(form, data = df)
Error in eval(expr, envir, enclos) : object 'regionWest' not found

Which makes sense and arise from the names used for the coefficients:

> nams
[1] "income"           "young"            "urban"           
[4] "regionWest:young"

It is quite likely that with some effort you could do:

  1. split any names with : on :,
  2. for each side, see if there is a partial match with the variables in the data frame df
  3. if there is a match, check that the bit not matched matches a level of the variable in df after coercing to a factor if required,
  4. if there is a match with a level, replace that side of the interaction with the variable name,
  5. if there isn't a match, see if there are any other partial matched, fail if there aren't

and so on. That is quite a lot of programming involved for a [so] posting, but if you are up to the challenge then the above should give you something to start from.

like image 185
Gavin Simpson Avatar answered Nov 01 '22 12:11

Gavin Simpson