Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate a prediction interval from a regression tree rpart object?

Tags:

r

tree

prediction

How do you generate a prediction interval from a regression tree that is fit using rpart?

It is my understanding that a regression tree models the response conditional on the mean of the leaf nodes. I don't know how to get the variance for a leaf node from the model, but what I would like to do is simulate using the mean and variance for a leaf node to obtain a prediction interval.

Predict.rpart() doesn't give an option for interval.

Example: I fit a tree with iris data, but predict doesn't have an option, "interval"

> r1 <- rpart(Sepal.Length ~ ., cp = 0.001, data = iris[1:nrow(iris)-1,])
> predict(r1,newdata=iris[nrow(iris),],type = "interval")
Error in match.arg(type) : 
  'arg' should be one of “vector”, “prob”, “class”, “matrix”
like image 885
goldisfine Avatar asked Mar 18 '15 19:03

goldisfine


1 Answers

Perhaps one option is a simple bootstrap of your training data?

library(rpart)
library(boot)

trainData <- iris[-150L, ]
predictData <- iris[150L, ]

rboot <- boot(trainData, function(data, idx) {
            bootstrapData <- data[idx, ]
            r1 <- rpart(Sepal.Length ~ ., bootstrapData, cp = 0.001)
            predict(r1, newdata = predictData)
        }, 1000L)

quantile(rboot$t, c(0.025, 0.975))
    2.5%    97.5% 
5.871393 6.766842 
like image 81
Jeff Avatar answered Oct 22 '22 09:10

Jeff