I'm using the caret function "train()" in one of my project and I'd like to add a "custom metric" F1-score. I looked at this url caret package But I cannot understand how I can build this score with the parameter available. There is an example of custom metric which is the following: <pre class="prettyprint"><code>## Example with a custom metric madSummary <- function (data, lev = NULL, model = NULL) { out <- mad(data$obs - data$pred, na.rm = TRUE) names(out) <- "MAD" out } robustControl <- trainControl(summaryFunction = madSummary) marsGrid <- expand.grid(degree = 1, nprune = (1:10) * 2) earthFit <- train(medv ~ ., data = BostonHousing, method = "earth", tuneGrid = marsGrid, metric = "MAD", maximize = FALSE, trControl = robustControl) </code></pre> <hr> Update: I tried your code but the problem is that it doesn't work with multiple classes like with the code below (The F1 score is displayed, but it is weird) I'm not sure but I think the function F1_score works only on binary classes <pre class="prettyprint"><code>library(caret) library(MLmetrics) set.seed(346) dat <- iris ## See http://topepo.github.io/caret/training.html#metrics f1 <- function(data, lev = NULL, model = NULL) { print(data) f1_val <- F1_Score(y_pred = data$pred, y_true = data$obs) c(F1 = f1_val) } # Split the Data into .75 input in_train <- createDataPartition(dat$Species, p = .70, list = FALSE) trainClass <- dat[in_train,] testClass <- dat[-in_train,] set.seed(35) mod <- train(Species ~ ., data = trainClass , method = "rpart", metric = "F1", trControl = trainControl(summaryFunction = f1, classProbs = TRUE)) print(mod) </code></pre> I coded a manual F1 score as well, with one input the confusion matrix: (I'm not sure if we can have a confusion matrix in "summaryFunction" <pre class="prettyprint"><code>F1_score <- function(mat, algoName){ ## ## Compute F1-score ## # Remark: left column = prediction // top = real values recall <- matrix(1:nrow(mat), ncol = nrow(mat)) precision <- matrix(1:nrow(mat), ncol = nrow(mat)) F1_score <- matrix(1:nrow(mat), ncol = nrow(mat)) for(i in 1:nrow(mat)){ recall[i] <- mat[i,i]/rowSums(mat)[i] precision[i] <- mat[i,i]/colSums(mat)[i] } for(i in 1:ncol(recall)){ F1_score[i] <- 2 * ( precision[i] * recall[i] ) / ( precision[i] + recall[i]) } # We display the matrix labels colnames(F1_score) <- colnames(mat) rownames(F1_score) <- algoName # Display the F1_score for each class F1_score # Display the average F1_score mean(F1_score[1,]) } </code></pre>

For the two-class case, you can try the following: <pre class="prettyprint lang-r prettyprint-override"><code>mod <- train(Class ~ ., data = dat, method = "rpart", tuneLength = 5, metric = "F", trControl = trainControl(summaryFunction = prSummary, classProbs = TRUE)) </code></pre> or define a custom summary function that combines both twoClassSummary and prSummary current favorite which provides the following possible evaluation metrics - AUROC, Spec, Sens, AUPRC, Precision, Recall, F - any of which can be used as the <code>metric</code> argument. This also includes the special case I mentioned in my comment on the accepted answer (F is NA). <pre class="prettyprint"><code>comboSummary <- function(data, lev = NULL, model = NULL) { out <- c(twoClassSummary(data, lev, model), prSummary(data, lev, model)) # special case missing value for F out$F <- ifelse(is.na(out$F), 0, out$F) names(out) <- gsub("AUC", "AUPRC", names(out)) names(out) <- gsub("ROC", "AUROC", names(out)) return(out) } mod <- train(Class ~ ., data = dat, method = "rpart", tuneLength = 5, metric = "F", trControl = trainControl(summaryFunction = comboSummary, classProbs = TRUE)) </code></pre>

You should look at The caret Package - Alternate Performance Metrics for details. A working example: <pre class="prettyprint"><code>library(caret) library(MLmetrics) set.seed(346) dat <- twoClassSim(200) ## See https://topepo.github.io/caret/model-training-and-tuning.html#metrics f1 <- function(data, lev = NULL, model = NULL) { f1_val <- F1_Score(y_pred = data$pred, y_true = data$obs, positive = lev[1]) c(F1 = f1_val) } set.seed(35) mod <- train(Class ~ ., data = dat, method = "rpart", tuneLength = 5, metric = "F1", trControl = trainControl(summaryFunction = f1, classProbs = TRUE)) </code></pre>

Caret package Custom metric

Tags:

r

r-caret

I'm using the caret function "train()" in one of my project and I'd like to add a "custom metric" F1-score. I looked at this url caret package But I cannot understand how I can build this score with the parameter available.

There is an example of custom metric which is the following:

## Example with a custom metric
madSummary <- function (data,
lev = NULL,
model = NULL) {
out <- mad(data$obs - data$pred,
na.rm = TRUE)
names(out) <- "MAD"
out
}
robustControl <- trainControl(summaryFunction = madSummary)
marsGrid <- expand.grid(degree = 1, nprune = (1:10) * 2)
earthFit <- train(medv ~ .,
data = BostonHousing,
method = "earth",
tuneGrid = marsGrid,
metric = "MAD",
maximize = FALSE,
trControl = robustControl)

Update:

I tried your code but the problem is that it doesn't work with multiple classes like with the code below (The F1 score is displayed, but it is weird) I'm not sure but I think the function F1_score works only on binary classes

library(caret)
library(MLmetrics)

set.seed(346)
dat <- iris

## See http://topepo.github.io/caret/training.html#metrics
f1 <- function(data, lev = NULL, model = NULL) {

print(data)
  f1_val <- F1_Score(y_pred = data$pred, y_true = data$obs)
  c(F1 = f1_val)
}

# Split the Data into .75 input
in_train <- createDataPartition(dat$Species, p = .70, list = FALSE)

trainClass <- dat[in_train,]
testClass <- dat[-in_train,]



set.seed(35)
mod <- train(Species ~ ., data = trainClass ,
             method = "rpart",
             metric = "F1",
             trControl = trainControl(summaryFunction = f1, 
                                  classProbs = TRUE))

print(mod)

I coded a manual F1 score as well, with one input the confusion matrix: (I'm not sure if we can have a confusion matrix in "summaryFunction"

F1_score <- function(mat, algoName){

##
## Compute F1-score
##


# Remark: left column = prediction // top = real values
recall <- matrix(1:nrow(mat), ncol = nrow(mat))
precision <- matrix(1:nrow(mat), ncol = nrow(mat))
F1_score <- matrix(1:nrow(mat), ncol = nrow(mat))


for(i in 1:nrow(mat)){
  recall[i] <- mat[i,i]/rowSums(mat)[i]
  precision[i] <- mat[i,i]/colSums(mat)[i]
}

for(i in 1:ncol(recall)){
   F1_score[i] <- 2 * ( precision[i] * recall[i] ) / ( precision[i] + recall[i])
 }

 # We display the matrix labels
 colnames(F1_score) <- colnames(mat)
 rownames(F1_score) <- algoName

 # Display the F1_score for each class
 F1_score

 # Display the average F1_score
 mean(F1_score[1,])
}

490

asked Jun 06 '16 20:06

MarcelRitos

2 Answers

For the two-class case, you can try the following:

mod <- train(Class ~ ., 
             data = dat,
             method = "rpart",
             tuneLength = 5,
             metric = "F",
             trControl = trainControl(summaryFunction = prSummary, 
                                      classProbs = TRUE))

or define a custom summary function that combines both twoClassSummary and prSummary current favorite which provides the following possible evaluation metrics - AUROC, Spec, Sens, AUPRC, Precision, Recall, F - any of which can be used as the metric argument. This also includes the special case I mentioned in my comment on the accepted answer (F is NA).

comboSummary <- function(data, lev = NULL, model = NULL) {
  out <- c(twoClassSummary(data, lev, model), prSummary(data, lev, model))

  # special case missing value for F
  out$F <- ifelse(is.na(out$F), 0, out$F)  
  names(out) <- gsub("AUC", "AUPRC", names(out))
  names(out) <- gsub("ROC", "AUROC", names(out))
  return(out)
}

mod <- train(Class ~ ., 
             data = dat,
             method = "rpart",
             tuneLength = 5,
             metric = "F",
             trControl = trainControl(summaryFunction = comboSummary, 
                                      classProbs = TRUE))

136

answered Oct 23 '22 20:10

Brian D

You should look at The caret Package - Alternate Performance Metrics for details. A working example:

library(caret)
library(MLmetrics)

set.seed(346)
dat <- twoClassSim(200)

## See https://topepo.github.io/caret/model-training-and-tuning.html#metrics
f1 <- function(data, lev = NULL, model = NULL) {
  f1_val <- F1_Score(y_pred = data$pred, y_true = data$obs, positive = lev[1])
  c(F1 = f1_val)
}

set.seed(35)
mod <- train(Class ~ ., data = dat,
             method = "rpart",
             tuneLength = 5,
             metric = "F1",
             trControl = trainControl(summaryFunction = f1, 
                                      classProbs = TRUE))

answered Oct 23 '22 20:10

topepo

Related questions
                            
                                replace trailing periods with spaces
                            
                                Can't coerce class of matrix numbers to integer
                            
                                How to output a list to file in R
                            
                                How can I pass data between functions in a Shiny app
                            
                                organization chart triangle plot
                            
                                convert matrix to raster in R
                            
                                Adding lists names as plot titles in lapply call in R
                            
                                Split a string column into several dummy variables
                            
                                R- how to dynamically name data frames? [duplicate]
                            
                                Mysterious error by parsing French dates on OSX
                            
                                How can I resolve the following dimension mismatch with R's K nearest neighbors?
                            
                                Correlation between groups in R data.table
                            
                                Fill NA in a time series only to a limited number
                            
                                Bubble Chart with ggplot2
                            
                                Align legend text in ggplot
                            
                                R data table - calculation for each row using all rows before current row
                            
                                Adding Curved Flight path using R's Leaflet Package
                            
                                Embedding fonts in ggplot2 charts in rmarkdown documents
                            
                                How to assign a value to a data.frame filtered by dplyr?
                            
                                R: fill down multiple columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With