While using Caret package for machine learning, I am struck with Caret's default "Positive" outcome picking i.e the first level of the outcome factor in binary classification problems.
Package says it can be set to the alternative level. Can any body help me to define the positive outcome?
Thanking you
Changing the positive Class: To change the Positive Class to "Malignant" can be done using the relevel() function. The relevel() changes the reference level of the variable.
The caret package (short for Classification And REgression Training) contains functions to streamline the model training process for complex regression and classification problems.
Caret stands for classification and regression training and is arguably the biggest project in R. This package is sufficient to solve almost any classification or regression machine learning problem.
The “no-information rate” is the largest proportion of the observed classes (there were more class 2 data than class 1 in this test set). A hypothesis test is also computed to evaluate whether the overall accuracy rate is greater than the rate of the largest class.
look at this example. Extended this from the caret examples with confusionMatrix.
lvs <- c("normal", "abnormal")
truth <- factor(rep(lvs, times = c(86, 258)),
levels = rev(lvs))
pred <- factor(
c(
rep(lvs, times = c(54, 32)),
rep(lvs, times = c(27, 231))),
levels = rev(lvs))
xtab <- table(pred, truth)
str(truth)
Factor w/ 2 levels "abnormal","normal": 2 2 2 2 2 2 2 2 2 2 ...
Because abnormal is the first level, this will be the default positive class
confusionMatrix(xtab)
Confusion Matrix and Statistics
truth
pred abnormal normal
abnormal 231 32
normal 27 54
Accuracy : 0.8285
95% CI : (0.7844, 0.8668)
No Information Rate : 0.75
P-Value [Acc > NIR] : 0.0003097
Kappa : 0.5336
Mcnemar's Test P-Value : 0.6025370
Sensitivity : 0.8953
Specificity : 0.6279
Pos Pred Value : 0.8783
Neg Pred Value : 0.6667
Prevalence : 0.7500
Detection Rate : 0.6715
Detection Prevalence : 0.7645
Balanced Accuracy : 0.7616
'Positive' Class : abnormal
To change to positive class = normal, just add this in the confusionMatrix. Notice the differences with the previous output, differences start appearing at the sensitivity and other calculations.
confusionMatrix(xtab, positive = "normal")
Confusion Matrix and Statistics
truth
pred abnormal normal
abnormal 231 32
normal 27 54
Accuracy : 0.8285
95% CI : (0.7844, 0.8668)
No Information Rate : 0.75
P-Value [Acc > NIR] : 0.0003097
Kappa : 0.5336
Mcnemar's Test P-Value : 0.6025370
Sensitivity : 0.6279
Specificity : 0.8953
Pos Pred Value : 0.6667
Neg Pred Value : 0.8783
Prevalence : 0.2500
Detection Rate : 0.1570
Detection Prevalence : 0.2355
Balanced Accuracy : 0.7616
'Positive' Class : normal
Changing the positive Class:
One of the proficient way of doing this is through re-leveling of the target variable.
For example: In the breast cancer Wisconsin dataset, the default level of Diagnosis is the basis of default Positive Class. The reference level of Diagnosis is:
cancer<-read.csv("breast-cancer-wisconsin.csv")
cancer$Diagnosis<-as.factor(cancer$Diagnosis)
levels(cancer$Diagnosis)
[1] "Benign" "Malignant"
After performing the test-train split and model fit.The resultant confusion matrix and performance measures are:
Confusion Matrix and Statistics
predicted Actual
Benign Malignant
Benign 115 7
Malignant 2 80
Accuracy : 0.9559
95% CI : (0.9179, 0.9796)
No Information Rate : 0.5735
P-Value [Acc > NIR] : <2e-16
Kappa : 0.9091
Mcnemar's Test P-Value : 0.1824
Sensitivity : 0.9829
Specificity : 0.9195
Pos Pred Value : 0.9426
Neg Pred Value : 0.9756
Prevalence : 0.5735
Detection Rate : 0.5637
Detection Prevalence: 0.5980
Balanced Accuracy : 0.9512
'Positive' Class : Benign
It is to note that the **Positive Class is Benign"
To change the Positive Class to "Malignant" can be done using the relevel()
function. The relevel()
changes the reference level of the variable.
cancer$Diagnosis <- relevel(cancer$Diagnosis, ref = "Malignant")
levels(cancer$Diagnosis)
[1] "Malignant" "Benign"
Again after performing the test-train split and model fitting, the confusion Matrix Performance Accuracy with changing of the reference is:
Confusion Matrix and Statistics
predicted Actual
Malignant Benign
Malignant 80 2
Benign 7 115
Accuracy : 0.9559
95% CI : (0.9179, 0.9796)
No Information Rate : 0.5735
P-Value [Acc > NIR] : <2e-16
Kappa : 0.9091
Mcnemar's Test P-Value : 0.1824
Sensitivity : 0.9195
Specificity : 0.9829
Pos Pred Value : 0.9756
Neg Pred Value : 0.9426
Prevalence : 0.4265
Detection Rate : 0.3922
Detection Prevalence : 0.4020
Balanced Accuracy : 0.9512
'Positive' Class : Malignant
Here the positive class is Malignant
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With