How to obtain a LM object from regsubsets

Question

Let's imagine we want to model the US State Public-School Expenditures (education) using income, young, urban and region as regressors. For more info: ?Anscombe Model: education ~ (income+young+urban)*region

library(car)
library(leaps)

#Loading Data
data(Anscombe)
data(state)
stateinfo <- data.frame(region=state.region,row.names=state.abb)
datamodel <- data.frame(merge(stateinfo,Anscombe,by="row.names"),row.names=1)
head(datamodel)
   region education income young urban
AK   West       372   4146 439.7   484
AL  South       112   2337 362.2   584
AR  South       134   2322 351.9   500
AZ   West       207   3027 387.5   796
CA   West       273   3968 348.4   909
CO   West       192   3340 358.1   785

#Saturated Model
MOD1 <- lm(education~(.-region)*region,datamodel)
summary(MOD1)
#anova(MOD1)

#Look for the "best" model
MOD1.subset <- regsubsets(education~(.-region)*region,datamodel,nvmax=15)
plot(MOD1.subset)

The model with 3 variables and 1 interaction (education~income+young+urban+RegionWest:young) seems to be the best in terms of BIC.

coef(MOD1.subset,4)

The questions is, how do I obtain the ML object from that model without writing the formula by hand?

Before posting, I found the package HH which has some interesting functions for regsubsets objects such as summaryHH and lm.regsubsets.

library(HH)
summaryHH(MOD1.subset)[4,]
lm.regsubsets(MOD1.subset,4)

lm.regsubsets does what I want but I think that has some problems parsing interactions, any alternatives?

Gavin Simpson · Accepted Answer

I don't think this is going to be possible, at least not without a lot of processing of the names of the coefficients. I got ~95% of the way there but fell down on the interaction term:

coefs <- coef(MOD1s, 4)
nams <- names(coefs)
nams <- nams[!nams %in% "(Intercept)"]
response <- as.character(as.formula(MOD1s$call[[2]])[[2]])
form <- as.formula(paste(response, paste(nams, collapse = " + "), sep = " ~ "))
df <- get(as.character(MOD1s$call[[3]]))
mod <- lm(form, data = df)

> mod <- lm(form, data = df)
Error in eval(expr, envir, enclos) : object 'regionWest' not found

Which makes sense and arise from the names used for the coefficients:

> nams
[1] "income"           "young"            "urban"           
[4] "regionWest:young"

It is quite likely that with some effort you could do:

split any names with : on :,
for each side, see if there is a partial match with the variables in the data frame df
if there is a match, check that the bit not matched matches a level of the variable in df after coercing to a factor if required,
if there is a match with a level, replace that side of the interaction with the variable name,
if there isn't a match, see if there are any other partial matched, fail if there aren't

and so on. That is quite a lot of programming involved for a [so] posting, but if you are up to the challenge then the above should give you something to start from.

How to obtain a LM object from regsubsets

Tags:

r

statistics

glm

Emer

Video Answer

1 Answers

Gavin Simpson

Recent Activity

Donate For Us

How to obtain a LM object from regsubsets

Tags:

r

statistics

glm

Emer

Video Answer

1 Answers

Gavin Simpson

Related questions

Recent Activity

Donate For Us