Trying to learn r-Caret and caretList. I am trying to follow the tutorial caretEnsemble Classification example
I have encountered a few errors and searched how to fix some of the basic set up. However, I am getting the error:
Warning messages:
1: In train.default(x, y, weights = w, ...) :
The metric "Accuracy" was not in the result set. ROC will be used instead.
2: In train.default(x, y, weights = w, ...) :
The metric "Accuracy" was not in the result set. ROC will be used instead.
My setup is:
#Libraries
library(caret)
library(devtools)
library(caretEnsemble)
#Data
library(mlbench)
dat <- mlbench.xor(500, 2)
X <- data.frame(dat$x)
Y <- factor(ifelse(dat$classes=='1', 'Yes', 'No'))
#Split train/test
train <- runif(nrow(X)) <= .66
#Setup CV Folds
#returnData=FALSE saves some space
folds=5
repeats=1
myControl <- trainControl(method='cv',
number=folds,
repeats=repeats,
returnResamp='none',
classProbs=TRUE,
returnData=FALSE,
savePredictions=TRUE,
verboseIter=TRUE,
allowParallel=TRUE,
summaryFunction=twoClassSummary,
index=createMultiFolds(Y[train],
k=folds,
times=repeats)
)
#Make list of all models
all.models<-caretList(Y~., data=X, trControl=myControl, methodList=c("blackboost", "parRF"))
I edited the section of "train all models" using caretList so that it will work with caretEnsemble and caretStack further down the code (link provided above).
How do I get the accuracies so that I can use them in caretEnsemble and caretStack?
I assume you would like to use 'Accuracy' as the summary metric that should be used to select the optimal base learner models across their resamples and the metalearner later on via caretEnsemble
or caretStack
.
In this case you must not set summaryFunction = twoClassSummary
in trainControl
because like this train
will use 'ROC' as the performance metric and not 'Accuracy'. Instead you should go with the default setting for summaryFunction
(That means you do not have to specify it explicitly in trainControl
). Like this train
which is called via caretList
will automatically use 'Accuracy' as the performance metric because of the categorical response.
In addition, there a few other things to note:
returnResamp = FALSE
in trainControl
. Because when you do, you won't be able to compare the model's individual accuracies later via summary(resamples(model.list))
caretList
. The correct caretList
call should begin like this caretList(Y[train] ~ ., data=X[train, ], ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With