Using the R package caret, how can I generate a ROC curve based on the cross-validation results of the train() function? Say, I do the following: <pre class="prettyprint"><code>data(Sonar) ctrl <- trainControl(method="cv", summaryFunction=twoClassSummary, classProbs=T) rfFit <- train(Class ~ ., data=Sonar, method="rf", preProc=c("center", "scale"), trControl=ctrl) </code></pre> The training function goes over a range of mtry parameter and calculates the ROC AUC. I would like to see the associated ROC curve -- how do I do that? Note: if the method used for sampling is LOOCV, then <code>rfFit</code> will contain a non-null data frame in the <code>rfFit$pred</code> slot, which seems to be exactly what I need. However, I need that for the "cv" method (k-fold validation) rather than LOO. Also: no, <code>roc</code> function that used to be included in former versions of caret is not an answer -- this is a low level function, you can't use it if you don't have the prediction probabilities for each cross-validated sample.

There is just the <code>savePredictions = TRUE</code> argument missing from <code>ctrl</code> (this also works for other resampling methods): <pre class="prettyprint"><code>library(caret) library(mlbench) data(Sonar) ctrl <- trainControl(method="cv", summaryFunction=twoClassSummary, classProbs=T, savePredictions = T) rfFit <- train(Class ~ ., data=Sonar, method="rf", preProc=c("center", "scale"), trControl=ctrl) library(pROC) # Select a parameter setting selectedIndices <- rfFit$pred$mtry == 2 # Plot: plot.roc(rfFit$pred$obs[selectedIndices], rfFit$pred$M[selectedIndices]) </code></pre> <img src="https://i.stack.imgur.com/wNcLQ.png" alt="ROC"> Maybe I am missing something, but a small concern is that <code>train</code> always estimates slightly different AUC values than <code>plot.roc</code> and <code>pROC::auc</code> (absolute difference < 0.005), although <code>twoClassSummary</code> uses <code>pROC::auc</code> to estimate the AUC. Edit: I assume this occurs because the ROC from <code>train</code> is the average of the AUC using the separate CV-Sets and here we are calculating the AUC over all resamples simultaneously to obtain the overall AUC. Update Since this is getting a bit of attention, here's a solution using <code>plotROC::geom_roc()</code> for <code>ggplot2</code>: <pre class="prettyprint"><code>library(ggplot2) library(plotROC) ggplot(rfFit$pred[selectedIndices, ], aes(m = M, d = factor(obs, levels = c("R", "M")))) + geom_roc(hjust = -0.4, vjust = 1.5) + coord_equal() </code></pre> <img src="https://i.stack.imgur.com/EzCcD.png" alt="ggplot_roc">

ROC curve from training data in caret

Tags:

r

roc

r-caret

Using the R package caret, how can I generate a ROC curve based on the cross-validation results of the train() function?

Say, I do the following:

data(Sonar) ctrl <- trainControl(method="cv",    summaryFunction=twoClassSummary,    classProbs=T) rfFit <- train(Class ~ ., data=Sonar,    method="rf", preProc=c("center", "scale"),    trControl=ctrl)

The training function goes over a range of mtry parameter and calculates the ROC AUC. I would like to see the associated ROC curve -- how do I do that?

Note: if the method used for sampling is LOOCV, then rfFit will contain a non-null data frame in the rfFit$pred slot, which seems to be exactly what I need. However, I need that for the "cv" method (k-fold validation) rather than LOO.

Also: no, roc function that used to be included in former versions of caret is not an answer -- this is a low level function, you can't use it if you don't have the prediction probabilities for each cross-validated sample.

726

asked Jun 30 '15 12:06

January

1 Answers

There is just the savePredictions = TRUE argument missing from ctrl (this also works for other resampling methods):

library(caret) library(mlbench) data(Sonar) ctrl <- trainControl(method="cv",                       summaryFunction=twoClassSummary,                       classProbs=T,                      savePredictions = T) rfFit <- train(Class ~ ., data=Sonar,                 method="rf", preProc=c("center", "scale"),                 trControl=ctrl) library(pROC) # Select a parameter setting selectedIndices <- rfFit$pred$mtry == 2 # Plot: plot.roc(rfFit$pred$obs[selectedIndices],          rfFit$pred$M[selectedIndices])

ROC

Maybe I am missing something, but a small concern is that train always estimates slightly different AUC values than plot.roc and pROC::auc (absolute difference < 0.005), although twoClassSummary uses pROC::auc to estimate the AUC. Edit: I assume this occurs because the ROC from train is the average of the AUC using the separate CV-Sets and here we are calculating the AUC over all resamples simultaneously to obtain the overall AUC.

Update Since this is getting a bit of attention, here's a solution using plotROC::geom_roc() for ggplot2:

library(ggplot2) library(plotROC) ggplot(rfFit$pred[selectedIndices, ],         aes(m = M, d = factor(obs, levels = c("R", "M")))) +      geom_roc(hjust = -0.4, vjust = 1.5) + coord_equal()

ggplot_roc

200

answered Sep 18 '22 12:09

thie1e

Related questions
                            
                                Converting python objects for rpy2
                            
                                Disable assignment via = in R
                            
                                How to save summary(lm) to a file?
                            
                                How to get geom_vline to honor facet_wrap?
                            
                                How to check if a vector contains n consecutive numbers
                            
                                ggplot2, legend on top and margin
                            
                                How to jitter/remove overlap for geom_text labels
                            
                                How to avoid using round() in every \Sexpr{}?
                            
                                Gradient legend in base
                            
                                How to check file size before opening?
                            
                                Changing date format to "%d/%m/%Y"
                            
                                Creating a data frame from two vectors using cbind
                            
                                How to select some rows with specific rownames from a dataframe? [closed]
                            
                                ggplot2: Divide Legend into Two Columns, Each with Its Own Title
                            
                                How to perform Lemmatization in R?
                            
                                Import data into R with an unknown number of columns?
                            
                                Standard error bars using stat_summary
                            
                                Positioning axes labels
                            
                                how to get index of sorted array elements
                            
                                how to drop columns by passing variable name with dplyr?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With