Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collecting out-of-fold predictions from a caret model

I want to use the out-of-fold predictions from a caret model to train a second-stage model that includes some of the original predictors. I can collect the out-of-fold predictions as follows:

#Load Data
set.seed(1)
library(caret)
library(mlbench)
data(BostonHousing)

#Build Model (see ?train)
rpartFit <- train(medv ~ . + rm:lstat, data = BostonHousing, method="rpart",
               trControl=trainControl(method='cv', number=folds, 
                                        savePredictions=TRUE))

#Collect out-of-fold predictions
out_of_fold <- rpartFit$pred
bestCP <- rpartFit$bestTune[,'.cp']
out_of_fold <- out_of_fold[out_of_fold$.cp==bestCP,]

Which is great, but they are in the wrong order:

> all.equal(out_of_fold$obs, BostonHousing$medv)
[1] "Mean relative difference: 0.4521906"

I know the train object returns a list of which indexes were used to train each fold:

> str(rpartFit$control$index)
List of 10
 $ Fold01: int [1:457] 1 2 3 4 5 6 7 8 9 10 ...
 $ Fold02: int [1:454] 2 3 4 8 10 11 12 13 14 15 ...
 $ Fold03: int [1:457] 1 2 3 4 5 6 7 8 9 10 ...
 $ Fold04: int [1:455] 1 2 3 5 6 7 8 9 10 11 ...
 $ Fold05: int [1:455] 1 2 3 4 5 6 7 8 9 10 ...
 $ Fold06: int [1:455] 1 2 3 4 5 6 7 8 9 10 ...
 $ Fold07: int [1:457] 1 3 4 5 6 7 8 9 10 13 ...
 $ Fold08: int [1:455] 1 2 4 5 6 7 9 11 12 14 ...
 $ Fold09: int [1:455] 1 2 3 4 5 6 7 8 9 10 ...
 $ Fold10: int [1:454] 1 2 3 4 5 6 7 8 9 10 ...

How can I use this information to put the observations in my out_of_fold object in the same order as the original BostonHousing dataset?

like image 407
Zach Avatar asked Jun 29 '12 19:06

Zach


People also ask

What is out of fold prediction?

An out-of-fold prediction is a prediction by the model during the k-fold cross-validation procedure. That is, out-of-fold predictions are those predictions made on the holdout datasets during the resampling procedure. If performed correctly, there will be one prediction for each example in the training dataset.

What is fold in cross-validation?

This means that each sample is given the opportunity to be used in the hold out set 1 time and used to train the model k-1 times. This approach involves randomly dividing the set of observations into k groups, or folds, of approximately equal size.


1 Answers

I'll add another column to the output that indicates the original row number for each sample in the next release (probably a month from now).

Max

like image 152
topepo Avatar answered Oct 05 '22 04:10

topepo