Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

forward stepwise regression

Tags:

r

regression

In R stepwise forward regression, I specify a minimal model and a set of variables to add (or not to add):

min.model = lm(y ~ 1)
fwd.model = step(min.model, direction='forward', scope=(~ x1 + x2 + x3 + ...))

Is there any way to specify using all variables in a matrix/data.frame, so I don't have to enumerate them?

Examples to illustrate what I'd like to do, but they don't work:

# 1
fwd.model = step(min.model, direction='forward', scope=(~ ., data=my.data.frame))

# 2
min.model = lm(y ~ 1, data=my.data.frame)
fwd.model = step(min.model, direction='forward', scope=(~ .))
like image 627
Michael Schubert Avatar asked Apr 07 '14 09:04

Michael Schubert


People also ask

What does forward stepwise regression do?

FORWARD STEPWISE REGRESSION is a stepwise regression approach that starts from the null model and adds a variable that improves the model the most, one at a time, until the stopping criterion is met.

What is the difference between forward and stepwise regression?

Stepwise regression is a modification of the forward selection so that after each step in which a variable was added, all candidate variables in the model are checked to see if their significance has been reduced below the specified tolerance level. If a nonsignificant variable is found, it is removed from the model.

What is forward linear regression?

Forward selection is a type of stepwise regression which begins with an empty model and adds in variables one by one. In each forward step, you add the one variable that gives the single best improvement to your model.

What is forward and backward selection?

Forward selection starts with a (usually empty) set of variables and adds variables to it, until some stop- ping criterion is met. Similarly, backward selection starts with a (usually complete) set of variables and then excludes variables from that set, again, until some stopping criterion is met.


2 Answers

scope expects (quoting the help page ?step)

either a single formula, or a list containing components ‘upper’ and ‘lower’, both formulae. See the details for how to specify the formulae and how they are used.

You can extract and use the formula corresponding to "~." like this:

> my.data.frame=data.frame(y=rnorm(20),foo=rnorm(20),bar=rnorm(20),baz=rnorm(20))
> min.model = lm(y ~ 1, data=my.data.frame)
> biggest <- formula(lm(y~.,my.data.frame))
> biggest
y ~ foo + bar + baz
> fwd.model = step(min.model, direction='forward', scope=biggest)
Start:  AIC=0.48
y ~ 1

       Df Sum of Sq    RSS      AIC
+ baz   1    2.5178 16.015 -0.44421
<none>              18.533  0.47614
+ foo   1    1.3187 17.214  0.99993
+ bar   1    0.4573 18.075  1.97644

Step:  AIC=-0.44
y ~ baz

       Df Sum of Sq    RSS      AIC
<none>              16.015 -0.44421
+ foo   1   0.41200 15.603  1.03454
+ bar   1   0.20599 15.809  1.29688
> 
like image 104
Stephan Kolassa Avatar answered Sep 19 '22 14:09

Stephan Kolassa


You can do it in one step like this

fwd.model = step(lm(y ~ 1, data=my.data.frame), direction='forward', scope=~ x1 + x2 + x3 + ...)

like image 22
shiny Avatar answered Sep 19 '22 14:09

shiny