When I'm running random forest model over my test data I'm getting different results for the same data set + model.
Here are the results where you can see the difference over the first column:
> table((predict(rfModelsL[[1]],newdata = a)) ,a$earlyR)
FALSE TRUE
FALSE 14 7
TRUE 13 66
> table((predict(rfModelsL[[1]],newdata = a)) ,a$earlyR)
FALSE TRUE
FALSE 15 7
TRUE 12 66
Although the difference is very small, I'm trying to understand what caused that. I'm guessing that predict
has "flexible" classification threshold, although I couldn't find that in the documentation; Am I right?
Thank you in advance
Random forest algorithm avoids and prevents overfitting by using multiple trees. The results are not accurate. This gives accurate and precise results. Decision trees require low computation, thus reducing time to implement and carrying low accuracy.
Unfortunately, the Random Forest can't extrapolate the linear trend and accurately predict new examples that have a time value higher than that seen in the training data (2000–2010). Even adjusting the number of trees doesn't fix the problem.
I will assume that you did not refit the model here, but it is simply the predict
call that is producing these results. The answer is probably this, from ?predict.randomForest
:
Any ties are broken at random, so if this is undesirable, avoid it by using odd number ntree in randomForest()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With