Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

backward elimination in R

Tags:

r

I am trying to get the final model using backward elimination with R but I got the following error message when I ran the code. Could anyone please help me this?

base<-lm(Eeff~NDF,data=phuong)
fullmodel<-lm(Eeff~NDF+ADF+CP+NEL+DMI+FCM,data=phuong)
 step(full, direction = "backward", trace=FALSE )

> Error in step(full, direction = "backward", trace = FALSE) : 
number of rows in use has changed: remove missing values?
like image 449
hn.phuong Avatar asked Aug 01 '12 21:08

hn.phuong


1 Answers

When comparing different submodels, it is necessary that they be fitted to the same set of data -- otherwise the results just don't make sense. (Consider the extreme situation where you have two predictors A and B, which are each measured on only half of your observations -- then the model y~A+B will be fitted to all the data, but the models y~A and y~B will be fitted to non-overlapping subsets of the data.) Thus, step won't allow you to compare submodels that (because of automatic removal of cases containing NA values) are using different subsets of the original data set.

Using na.omit on the original data set should fix the problem.

fullmodel <- lm(Eeff ~ NDF + ADF + CP + NEL + DMI + FCM, data = na.omit(phuong))
step(fullmodel, direction = "backward", trace=FALSE ) 

However, if you have a lot of NA values in different predictors, you may end up losing a lot of your data set -- in an extreme case you could lose the entire data set. If this happens you have to reconsider your modeling strategy ...

like image 156
Ben Bolker Avatar answered Nov 15 '22 19:11

Ben Bolker