Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random Forest Crossvalidation in R

I am working on a random forest in R and I would like to add the 10- folds cross validation to my model. But I am quite stuck there. This is sample of my code.

install.packages('randomForest')
library(randomForest)
set.seed(123)
fit <- randomForest(as.factor(sickrabbit) ~ Feature1,..., FeatureN ,data=training1, importance=TRUE,sampsize = c(200,300),ntree=500)

I found online the function rfcv in caret but I am not sure to understand how it works. Can anyone help with this function or propose an easier way to implement cross validation. Can you do it using random forest package instead of caret?

like image 861
Rita A. Singer Avatar asked Jul 26 '15 13:07

Rita A. Singer


1 Answers

You don't need to cross-validate a random forest model. You are getting stuck with the randomForest package because it wasn't designed to do this.

Here is a snippet from Breiman's official documentation:

In random forests, there is no need for cross-validation or a separate test set to get an unbiased estimate of the test set error. It is estimated internally, during the run, as follows:

Each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are left out of the bootstrap sample and not used in the construction of the kth tree.

Put each case left out in the construction of the kth tree down the kth tree to get a classification. In this way, a test set classification is obtained for each case in about one-third of the trees. At the end of the run, take j to be the class that got most of the votes every time case n was oob. The proportion of times that j is not equal to the true class of n averaged over all cases is the oob error estimate. This has proven to be unbiased in many tests.

like image 113
Tim Biegeleisen Avatar answered Oct 29 '22 03:10

Tim Biegeleisen