Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different results using Random Forest prediction in R

When I'm running random forest model over my test data I'm getting different results for the same data set + model.

Here are the results where you can see the difference over the first column:

> table((predict(rfModelsL[[1]],newdata = a)) ,a$earlyR)

        FALSE TRUE
 FALSE    14    7
 TRUE     13   66

> table((predict(rfModelsL[[1]],newdata = a)) ,a$earlyR)

        FALSE TRUE
 FALSE    15    7
 TRUE     12   66

Although the difference is very small, I'm trying to understand what caused that. I'm guessing that predict has "flexible" classification threshold, although I couldn't find that in the documentation; Am I right?

Thank you in advance

like image 353
staove7 Avatar asked Jan 24 '17 13:01

staove7


People also ask

Why the trees from a random forest provide different results?

Random forest algorithm avoids and prevents overfitting by using multiple trees. The results are not accurate. This gives accurate and precise results. Decision trees require low computation, thus reducing time to implement and carrying low accuracy.

Why random forest is not good for regression?

Unfortunately, the Random Forest can't extrapolate the linear trend and accurately predict new examples that have a time value higher than that seen in the training data (2000–2010). Even adjusting the number of trees doesn't fix the problem.


1 Answers

I will assume that you did not refit the model here, but it is simply the predict call that is producing these results. The answer is probably this, from ?predict.randomForest:

Any ties are broken at random, so if this is undesirable, avoid it by using odd number ntree in randomForest()

like image 89
mpjdem Avatar answered Oct 06 '22 06:10

mpjdem