Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

randomForestSRC predicted values

Tags:

r

I've been trying to use the R package 'randomForestSRC' to predict some stuff, but after running 'rfsrc' and 'predict.rfsrc'. Both have a return called predicted, but the predicted values don't seem to correlate with any of my values. Does anyone know what these predicted value are?

The commands I run: (this is from their examples on the documentation)

data(veteran, package = "randomForestSRC")
train <- sample(1:nrow(veteran), round(nrow(veteran) * 0.80))
veteran.grow <- rfsrc(Surv(time, status) ~ ., veteran[train, ], ntree = 100)
veteran.pred <- predict(veteran.grow, veteran[-train , ])

The predicted values:

veteran.pred$predicted
[1] 49.96350 58.45100 38.28317 63.17000 67.56917 57.45633 66.23733 54.81967 72.60817 47.71083 43.94983 37.85000
[13] 41.80333 47.84233 85.81488 70.49050 92.45600 70.95321 85.63933 45.38833 66.74655 76.46067 52.68717 68.90750
[25] 85.17983 43.31617 48.80267
like image 629
Allen Huang Avatar asked Jun 22 '16 17:06

Allen Huang


Video Answer


1 Answers

The predicted values from both rfsrc and predict.rfsrc are the predictions based on all the trees built, using the training data and testing data respectively.

#In-bag predicted value for the first case in training data
veteran.grow$predicted[1]
> 80.56843

#Prediction based on all trees for the same case
predict(veteran.grow, 
        newdata=veteran[train[1],])$predicted
> 80.56843

The rfsrc also returns out-of-bag prediction as predicted.oob. Which is based on the trees that the case was not used in the building process. For example, if case 1 was used in trees 1 to 30, the OOB prediction for case 1 would be based on trees 31 to 100, instead of all the trees.

#Keeping the info about nodes of each tree
veteran.grow <- rfsrc(Surv(time, status) ~ ., veteran[train, ], ntree = 100, 
  membership=T)

#Out-of-bag predicted value for the first case
veteran.grow$predicted.oob[1]
> 72.88305

#Prediction based on the trees that case 1 was not included in
ind = which(veteran.grow$inbag[1,]==0)
predict(veteran.grow, 
        newdata=veteran[train[1],], 
        get.tree=ind)$predicted
> 72.88305

like image 93
Ryan SY Kwan Avatar answered Nov 03 '22 21:11

Ryan SY Kwan