Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R caret: Maximizing sensitivity for manually defined positive class for training (classification),

Tags:

r

r-caret

Short Version:

Is there a way to instruct caret to train a regression-model

  1. Using a user defined label as "positive class label"?
  2. Optimize the model for sensitivity during training (instead of ROC)?

Long Version:

I have a dataframe

> feature1 <-                 c(1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0)
> feature2 <-                 c(1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1)
> feature3 <-                 c(0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0)
> TARGET <- factor(make.names(c(1,0,1,1,0,0,1,0,1,1,1,0,1,0,0,0,1,0,1,1)))
> df <- data.frame(feature1, feature2, feature3, TARGET)

And model training is implemented like

> ctrl <- trainControl(
+     method="repeatedcv",
+     repeats = 2)
> 
> tuneGrid <- expand.grid(k = c(2,5,7))
> 
> tune <- train(
+     TARGET ~ .,
+     metric = '???',
+     maximize = TRUE,
+     data = df,
+     method = "knn", 
+     trControl = ctrl, 
+     preProcess = c("center","scale"), 
+     tuneGrid = tuneGrid
+ )
> sclasses <- predict(tune, newdata = df)
> df$PREDICTION <- make.names(factor(sclasses), unique = FALSE, allow_ = TRUE)

I want to maximize the sensitivity = precision = A / ( A + C )

enter image description here

Where Event (in the image) should be in my case X1 = action taken. But caret uses X0 = no action taken.

I can set the positive class for my confusion matrix by using the positive argument like

> confusionMatrix(df$PREDICTION, df$TARGET, positive = "X1")

But is there any way to set this while training (maximizing sensitivity)?

I already checked if there is another metric fitting my need, but I wasn't able to find one in the documentation. Do I have to implement my own summaryFunction for trainControl?

Thanks!

like image 508
Boern Avatar asked Mar 01 '16 14:03

Boern


1 Answers

As far as I know, there is no direct way to specify this in the training (I have been searching for this myself for a while now). However, I found a workaround: you can just reorder the levels of the target variable in the dataframe. As the training algorithm will take the first encountered level as the positive class by default, this solves your problem. Just add this simple line of code and that does the trick:

TARGET <- factor(make.names(c(1,0,1,1,0,0,1,0,1,1,1,0,1,0,0,0,1,0,1,1)))
TARGET <- relevel(TARGET, "X1")
like image 132
Bart VdW Avatar answered Nov 06 '22 11:11

Bart VdW