I used caret for logistic regression in R:
ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 10,
savePredictions = TRUE)
mod_fit <- train(Y ~ ., data=df, method="glm", family="binomial",
trControl = ctrl)
print(mod_fit)
The default metric printed is accuracy and Cohen kappa. I want to extract the matching metrics like sensitivity, specificity, positive predictive value etc. but I cannot find an easy way to do it. The final model is provided but it is trained on all the data (as far as I can tell from documentation), so I cannot use it for predicting anew.
Confusion matrix calculates all required parameters, but passing it as a summary function doesn't work:
ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 10,
savePredictions = TRUE, summaryFunction = confusionMatrix)
mod_fit <- train(Y ~ ., data=df, method="glm", family="binomial",
trControl = ctrl)
Error: `data` and `reference` should be factors with the same levels.
13.
stop("`data` and `reference` should be factors with the same levels.",
call. = FALSE)
12.
confusionMatrix.default(testOutput, lev, method)
11.
ctrl$summaryFunction(testOutput, lev, method)
Is there a way to extract this information in addition to accuracy and kappa, or somehow find it in the train_object returned by the caret train?
Thanks in advance!
Caret already has summary functions to output all the metrics you mention:
defaultSummary
outputs Accuracy and KappatwoClassSummary
outputs AUC (area under the ROC curve - see last line of answer), sensitivity and specificityprSummary
outputs precision and recall
in order to get combined metrics you can write your own summary function which combines the outputs of these three:
library(caret)
MySummary <- function(data, lev = NULL, model = NULL){
a1 <- defaultSummary(data, lev, model)
b1 <- twoClassSummary(data, lev, model)
c1 <- prSummary(data, lev, model)
out <- c(a1, b1, c1)
out}
lets try on the Sonar data set:
library(mlbench)
data("Sonar")
when defining the train control it is important to set classProbs = TRUE
since some of these metrics (ROC and prAUC) can not be calculated based on predicted class but based on the predicted probabilities.
ctrl <- trainControl(method = "repeatedcv",
number = 10,
savePredictions = TRUE,
summaryFunction = MySummary,
classProbs = TRUE)
Now fit the model of your choice:
mod_fit <- train(Class ~.,
data = Sonar,
method = "rf",
trControl = ctrl)
mod_fit$results
#output
mtry Accuracy Kappa ROC Sens Spec AUC Precision Recall F AccuracySD KappaSD
1 2 0.8364069 0.6666364 0.9454798 0.9280303 0.7333333 0.8683726 0.8121087 0.9280303 0.8621526 0.10570484 0.2162077
2 31 0.8179870 0.6307880 0.9208081 0.8840909 0.7411111 0.8450612 0.8074942 0.8840909 0.8374326 0.06076222 0.1221844
3 60 0.8034632 0.6017979 0.9049242 0.8659091 0.7311111 0.8332068 0.7966889 0.8659091 0.8229330 0.06795824 0.1369086
ROCSD SensSD SpecSD AUCSD PrecisionSD RecallSD FSD
1 0.04393947 0.05727927 0.1948585 0.03410854 0.12717667 0.05727927 0.08482963
2 0.04995650 0.11053858 0.1398657 0.04694993 0.09075782 0.11053858 0.05772388
3 0.04965178 0.12047598 0.1387580 0.04820979 0.08951728 0.12047598 0.06715206
in this output
ROC is in fact the area under the ROC curve - usually called AUC
and
AUC is the area under the precision-recall curve across all cutoffs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With