Let's imagine we want to model the US State Public-School Expenditures (education) using income, young, urban and region as regressors. For more info: ?Anscombe
Model: education ~ (income+young+urban)*region
library(car)
library(leaps)
#Loading Data
data(Anscombe)
data(state)
stateinfo <- data.frame(region=state.region,row.names=state.abb)
datamodel <- data.frame(merge(stateinfo,Anscombe,by="row.names"),row.names=1)
head(datamodel)
region education income young urban
AK West 372 4146 439.7 484
AL South 112 2337 362.2 584
AR South 134 2322 351.9 500
AZ West 207 3027 387.5 796
CA West 273 3968 348.4 909
CO West 192 3340 358.1 785
#Saturated Model
MOD1 <- lm(education~(.-region)*region,datamodel)
summary(MOD1)
#anova(MOD1)
#Look for the "best" model
MOD1.subset <- regsubsets(education~(.-region)*region,datamodel,nvmax=15)
plot(MOD1.subset)
The model with 3 variables and 1 interaction (education~income+young+urban+RegionWest:young) seems to be the best in terms of BIC.
coef(MOD1.subset,4)
The questions is, how do I obtain the ML object from that model without writing the formula by hand?
Before posting, I found the package HH which has some interesting functions for regsubsets objects such as summaryHH
and lm.regsubsets
.
library(HH)
summaryHH(MOD1.subset)[4,]
lm.regsubsets(MOD1.subset,4)
lm.regsubsets
does what I want but I think that has some problems parsing interactions, any alternatives?
I don't think this is going to be possible, at least not without a lot of processing of the names of the coefficients. I got ~95% of the way there but fell down on the interaction term:
coefs <- coef(MOD1s, 4)
nams <- names(coefs)
nams <- nams[!nams %in% "(Intercept)"]
response <- as.character(as.formula(MOD1s$call[[2]])[[2]])
form <- as.formula(paste(response, paste(nams, collapse = " + "), sep = " ~ "))
df <- get(as.character(MOD1s$call[[3]]))
mod <- lm(form, data = df)
> mod <- lm(form, data = df)
Error in eval(expr, envir, enclos) : object 'regionWest' not found
Which makes sense and arise from the names used for the coefficients:
> nams
[1] "income" "young" "urban"
[4] "regionWest:young"
It is quite likely that with some effort you could do:
:
on :
,df
df
after coercing to a factor if required,and so on. That is quite a lot of programming involved for a [so] posting, but if you are up to the challenge then the above should give you something to start from.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With