Difference between predict(model) and predict(model$finalModel) using caret for classification in R

Tags:

Whats the difference between

predict(rf, newdata=testSet)

and

predict(rf$finalModel, newdata=testSet)

i train the model with preProcess=c("center", "scale")

tc <- trainControl("repeatedcv", number=10, repeats=10, classProbs=TRUE, savePred=T)
rf <- train(y~., data=trainingSet, method="rf", trControl=tc, preProc=c("center", "scale"))

and i receive 0 true positives when i run it on a centered and scaled testSet

testSetCS <- testSet
xTrans <- preProcess(testSetCS)
testSetCS<- predict(xTrans, testSet)
testSet$Prediction <- predict(rf, newdata=testSet)
testSetCS$Prediction <- predict(rf, newdata=testSetCS)

but receive some true positives when i run it on an unscaled testSet. I have to use the rf$finalModel to receive some true postives on the centered and scaled testSet and the rf object on the unscaled...what am i missing?

edit

tests:

tc <- trainControl("repeatedcv", number=10, repeats=10, classProbs=TRUE, savePred=T)
RF <-  train(Y~., data= trainingSet, method="rf", trControl=tc) #normal trainingData
RF.CS <- train(Y~., data= trainingSet, method="rf", trControl=tc, preProc=c("center", "scale")) #scaled and centered trainingData

on normal testSet:

RF predicts reasonable              (Sensitivity= 0.33, Specificity=0.97)
RF$finalModel predicts bad       (Sensitivity= 0.74, Specificity=0.36)
RF.CS predicts reasonable           (Sensitivity= 0.31, Specificity=0.97)
RF.CS$finalModel same results like RF.CS    (Sensitivity= 0.31, Specificity=0.97)

on centered and scaled testSetCS:

RF predicts very bad                (Sensitivity= 0.00, Specificity=1.00)
RF$finalModel predicts reasonable       (Sensitivity= 0.33, Specificity=0.98)
RF.CS predicts like RF              (Sensitivity= 0.00, Specificity=1.00)
RF.CS$finalModel predicts like RF       (Sensitivity= 0.00, Specificity=1.00)

so it seems as if the $finalModel needs the same format of trainingSet and testSet whereas the trained object accepts only uncentered and unscaled data, regardless of the selected preProcess parameter?

prediction code (where testSet is normal data and testSetCS is centered and scaled ):

testSet$Prediction <- predict(RF, newdata=testSet)
testSet$PredictionFM <- predict(RF$finalModel, newdata=testSet)
testSet$PredictionCS <- predict(RF.CS, newdata=testSet)
testSet$PredictionCSFM <- predict(RF.CS$finalModel, newdata=testSet)

testSetCS$Prediction <- predict(RF, newdata=testSetCS)
testSetCS$PredictionFM <- predict(RF$finalModel, newdata=testSetCS)
testSetCS$PredictionCS <- predict(RF.CS, newdata=testSetCS)
testSetCS$PredictionCSFM <- predict(RF.CS$finalModel, newdata=testSetCS)

460

asked Jan 13 '14 16:01

Frank

1 Answers

Frank,

This is really similar to your other question on Cross Validated.

You really need to

1) show your exact prediction code for each result

2) give us a reproducible example.

With the normal testSet, RF.CS and RF.CS$finalModel should not be giving you the same results and we should be able to reproduce that. Plus, there are syntax errors in your code so it can't be exactly what you executed.

Finally, I'm not really sure why you would use the finalModel object at all. The point of train is to handle the details and doing things this way (which is your option) circumvents the complete set of code that would normally be applied.

Here is a reproducible example:

 library(mlbench)
 data(Sonar)

 set.seed(1)
 inTrain <- createDataPartition(Sonar$Class)
 training <- Sonar[inTrain[[1]], ]
 testing <- Sonar[-inTrain[[1]], ]

 pp <- preProcess(training[,-ncol(Sonar)])
 training2 <- predict(pp, training[,-ncol(Sonar)])
 training2$Class <- training$Class
 testing2 <- predict(pp, testing[,-ncol(Sonar)])
 testing2$Class <- testing2$Class

 tc <- trainControl("repeatedcv", 
                    number=10, 
                    repeats=10, 
                    classProbs=TRUE, 
                    savePred=T)
 set.seed(2)
 RF <-  train(Class~., data= training, 
              method="rf", 
              trControl=tc)
 #normal trainingData
 set.seed(2)
 RF.CS <- train(Class~., data= training, 
                method="rf", 
                trControl=tc, 
                preProc=c("center", "scale")) 
 #scaled and centered trainingData

Here are some results:

 > ## These should not be the same
 > all.equal(predict(RF, testing,  type = "prob")[,1],
 +           predict(RF, testing2, type = "prob")[,1])
 [1] "Mean relative difference: 0.4067554"
 > 
 > ## Nor should these
 > all.equal(predict(RF.CS, testing,  type = "prob")[,1],
 +           predict(RF.CS, testing2, type = "prob")[,1])
 [1] "Mean relative difference: 0.3924037"
 > 
 > all.equal(predict(RF.CS,            testing, type = "prob")[,1],
 +           predict(RF.CS$finalModel, testing, type = "prob")[,1])
 [1] "names for current but not for target"
 [2] "Mean relative difference: 0.7452435" 
 >
 > ## These should be and are close (just based on the 
 > ## random sampling used in the final RF fits)
 > all.equal(predict(RF,    testing, type = "prob")[,1],
 +           predict(RF.CS, testing, type = "prob")[,1])
 [1] "Mean relative difference: 0.04198887"

Max

138

answered Oct 19 '22 05:10

topepo

Related questions
                            
                                Removing ggplot legend symbol while retaining label
                            
                                How to override ggplot2's axis formatting?
                            
                                How do I sub sample data by group using ddply?
                            
                                quadprog optimization
                            
                                ggplot2 geom_bar position = "dodge" does not dodge
                            
                                How can I superimpose an arbitrary parametric distribution over a histogram using ggplot?
                            
                                Why and where are \n newline characters getting introduced to c()?
                            
                                Mixing line and scatterplot in ggplot
                            
                                using graph.adjacency() in R
                            
                                Replace the argument with a list in R [duplicate]
                            
                                Symmetrical, violin plot-like histogram?
                            
                                Creating Shapefiles in R
                            
                                Get correct datetime from Oracle query via ROracle in R
                            
                                R: Change value of an argument in ellipsis and pass ellipsis to the other function without using list() and eval()
                            
                                Avoiding floating xtable in knitr hides the caption of the table
                            
                                Safely evaluating arithmetic expressions in R?
                            
                                R: function passed as argument is not found
                            
                                Random Forest by R package party overfits on random data
                            
                                Install package (library) if not installed [duplicate]
                            
                                Delayed execution in R Shiny app

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between predict(model) and predict(model$finalModel) using caret for classification in R

Tags:

r

classification

prediction

r-caret

Frank

People also ask

1 Answers

topepo

Recent Activity

Donate For Us