R package randomForest reports mean squared errors for each tree in the forest. I need, however, a measure of confidence for each case in the data. Since randomForest calculates the casewise predictions by averaging the predictions of the single trees, I guess that it should also be possible to calculate a casewise standard error and thus a confidence interval. Can this be done using the output randomForest object (if so: how?) or do I have to dig into the source code?
No need to dig into the source code. You only need to read the documentation. ?predict.randomForest
states that one of its arguments is called predict.all
:
predict.all Should the predictions of all trees be kept?
So setting that to TRUE
will keep a prediction for each case, for each tree, which you can then use to calculate standard error for each case.
I have recently been made aware of this paper by Stefan Wager, Trevor Hastie and Brad Efron which investigates more rigorously the idea of standard errors for the predictions generated by random forests (and other bagged predictors).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With