In R stepwise forward regression, I specify a minimal model and a set of variables to add (or not to add):
min.model = lm(y ~ 1)
fwd.model = step(min.model, direction='forward', scope=(~ x1 + x2 + x3 + ...))
Is there any way to specify using all variables in a matrix/data.frame, so I don't have to enumerate them?
Examples to illustrate what I'd like to do, but they don't work:
# 1
fwd.model = step(min.model, direction='forward', scope=(~ ., data=my.data.frame))
# 2
min.model = lm(y ~ 1, data=my.data.frame)
fwd.model = step(min.model, direction='forward', scope=(~ .))
FORWARD STEPWISE REGRESSION is a stepwise regression approach that starts from the null model and adds a variable that improves the model the most, one at a time, until the stopping criterion is met.
Stepwise regression is a modification of the forward selection so that after each step in which a variable was added, all candidate variables in the model are checked to see if their significance has been reduced below the specified tolerance level. If a nonsignificant variable is found, it is removed from the model.
Forward selection is a type of stepwise regression which begins with an empty model and adds in variables one by one. In each forward step, you add the one variable that gives the single best improvement to your model.
Forward selection starts with a (usually empty) set of variables and adds variables to it, until some stop- ping criterion is met. Similarly, backward selection starts with a (usually complete) set of variables and then excludes variables from that set, again, until some stopping criterion is met.
scope
expects (quoting the help page ?step
)
either a single formula, or a list containing components ‘upper’ and ‘lower’, both formulae. See the details for how to specify the formulae and how they are used.
You can extract and use the formula corresponding to "~." like this:
> my.data.frame=data.frame(y=rnorm(20),foo=rnorm(20),bar=rnorm(20),baz=rnorm(20))
> min.model = lm(y ~ 1, data=my.data.frame)
> biggest <- formula(lm(y~.,my.data.frame))
> biggest
y ~ foo + bar + baz
> fwd.model = step(min.model, direction='forward', scope=biggest)
Start: AIC=0.48
y ~ 1
Df Sum of Sq RSS AIC
+ baz 1 2.5178 16.015 -0.44421
<none> 18.533 0.47614
+ foo 1 1.3187 17.214 0.99993
+ bar 1 0.4573 18.075 1.97644
Step: AIC=-0.44
y ~ baz
Df Sum of Sq RSS AIC
<none> 16.015 -0.44421
+ foo 1 0.41200 15.603 1.03454
+ bar 1 0.20599 15.809 1.29688
>
You can do it in one step like this
fwd.model = step(lm(y ~ 1, data=my.data.frame), direction='forward', scope=~ x1 + x2 + x3 + ...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With