How do you generate a prediction interval from a regression tree that is fit using rpart?
It is my understanding that a regression tree models the response conditional on the mean of the leaf nodes. I don't know how to get the variance for a leaf node from the model, but what I would like to do is simulate using the mean and variance for a leaf node to obtain a prediction interval.
Predict.rpart() doesn't give an option for interval.
Example: I fit a tree with iris data, but predict doesn't have an option, "interval"
> r1 <- rpart(Sepal.Length ~ ., cp = 0.001, data = iris[1:nrow(iris)-1,])
> predict(r1,newdata=iris[nrow(iris),],type = "interval")
Error in match.arg(type) :
'arg' should be one of “vector”, “prob”, “class”, “matrix”
Perhaps one option is a simple bootstrap of your training data?
library(rpart)
library(boot)
trainData <- iris[-150L, ]
predictData <- iris[150L, ]
rboot <- boot(trainData, function(data, idx) {
bootstrapData <- data[idx, ]
r1 <- rpart(Sepal.Length ~ ., bootstrapData, cp = 0.001)
predict(r1, newdata = predictData)
}, 1000L)
quantile(rboot$t, c(0.025, 0.975))
2.5% 97.5%
5.871393 6.766842
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With