Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caret package - defining Positive result

Tags:

r

r-caret

While using Caret package for machine learning, I am struck with Caret's default "Positive" outcome picking i.e the first level of the outcome factor in binary classification problems.

Package says it can be set to the alternative level. Can any body help me to define the positive outcome?

Thanking you

like image 919
duvvurum Avatar asked Oct 30 '15 08:10

duvvurum


People also ask

How do I change my caret positive class?

Changing the positive Class: To change the Positive Class to "Malignant" can be done using the relevel() function. The relevel() changes the reference level of the variable.

What is the caret package used for?

The caret package (short for Classification And REgression Training) contains functions to streamline the model training process for complex regression and classification problems.

What is caret used for in R?

Caret stands for classification and regression training and is arguably the biggest project in R. This package is sufficient to solve almost any classification or regression machine learning problem.

What does no information rate mean?

The “no-information rate” is the largest proportion of the observed classes (there were more class 2 data than class 1 in this test set). A hypothesis test is also computed to evaluate whether the overall accuracy rate is greater than the rate of the largest class.


2 Answers

look at this example. Extended this from the caret examples with confusionMatrix.

lvs <- c("normal", "abnormal")
truth <- factor(rep(lvs, times = c(86, 258)),
                levels = rev(lvs))
pred <- factor(
  c(
    rep(lvs, times = c(54, 32)),
    rep(lvs, times = c(27, 231))),               
  levels = rev(lvs))

xtab <- table(pred, truth)

str(truth)
Factor w/ 2 levels "abnormal","normal": 2 2 2 2 2 2 2 2 2 2 ...

Because abnormal is the first level, this will be the default positive class

confusionMatrix(xtab)

Confusion Matrix and Statistics

          truth
pred       abnormal normal
  abnormal      231     32
  normal         27     54

               Accuracy : 0.8285          
                 95% CI : (0.7844, 0.8668)
    No Information Rate : 0.75            
    P-Value [Acc > NIR] : 0.0003097       

                  Kappa : 0.5336          
 Mcnemar's Test P-Value : 0.6025370       

            Sensitivity : 0.8953          
            Specificity : 0.6279          
         Pos Pred Value : 0.8783          
         Neg Pred Value : 0.6667          
             Prevalence : 0.7500          
         Detection Rate : 0.6715          
   Detection Prevalence : 0.7645          
      Balanced Accuracy : 0.7616          

       'Positive' Class : abnormal     

To change to positive class = normal, just add this in the confusionMatrix. Notice the differences with the previous output, differences start appearing at the sensitivity and other calculations.

confusionMatrix(xtab, positive = "normal")

Confusion Matrix and Statistics

          truth
pred       abnormal normal
  abnormal      231     32
  normal         27     54

               Accuracy : 0.8285          
                 95% CI : (0.7844, 0.8668)
    No Information Rate : 0.75            
    P-Value [Acc > NIR] : 0.0003097       

                  Kappa : 0.5336          
 Mcnemar's Test P-Value : 0.6025370       

            Sensitivity : 0.6279          
            Specificity : 0.8953          
         Pos Pred Value : 0.6667          
         Neg Pred Value : 0.8783          
             Prevalence : 0.2500          
         Detection Rate : 0.1570          
   Detection Prevalence : 0.2355          
      Balanced Accuracy : 0.7616          

       'Positive' Class : normal 
like image 169
phiver Avatar answered Sep 27 '22 21:09

phiver


Changing the positive Class:

One of the proficient way of doing this is through re-leveling of the target variable.

For example: In the breast cancer Wisconsin dataset, the default level of Diagnosis is the basis of default Positive Class. The reference level of Diagnosis is:

cancer<-read.csv("breast-cancer-wisconsin.csv")
cancer$Diagnosis<-as.factor(cancer$Diagnosis)
levels(cancer$Diagnosis)
[1] "Benign"    "Malignant"

After performing the test-train split and model fit.The resultant confusion matrix and performance measures are:

Confusion Matrix and Statistics

predicted        Actual
             Benign Malignant
Benign       115         7
Malignant      2        80
                                      
           Accuracy : 0.9559          
             95% CI : (0.9179, 0.9796)
No Information Rate : 0.5735          
P-Value [Acc > NIR] : <2e-16                          
              Kappa : 0.9091                 
 Mcnemar's Test P-Value : 0.1824                  
        Sensitivity : 0.9829          
        Specificity : 0.9195          
     Pos Pred Value : 0.9426          
     Neg Pred Value : 0.9756          
         Prevalence : 0.5735          
     Detection Rate : 0.5637          
Detection Prevalence: 0.5980  
Balanced Accuracy   : 0.9512
'Positive' Class    : Benign 

It is to note that the **Positive Class is Benign"

To change the Positive Class to "Malignant" can be done using the relevel() function. The relevel() changes the reference level of the variable.

cancer$Diagnosis <- relevel(cancer$Diagnosis, ref = "Malignant")
levels(cancer$Diagnosis)
[1] "Malignant" "Benign"

Again after performing the test-train split and model fitting, the confusion Matrix Performance Accuracy with changing of the reference is:

Confusion Matrix and Statistics

   predicted        Actual
               Malignant Benign
  Malignant        80      2
  Benign            7    115
                                      
           Accuracy : 0.9559          
             95% CI : (0.9179, 0.9796)
No Information Rate : 0.5735          
P-Value [Acc > NIR] : <2e-16                                   
          Kappa : 0.9091                               
 Mcnemar's Test P-Value : 0.1824                               
        Sensitivity : 0.9195          
        Specificity : 0.9829          
     Pos Pred Value : 0.9756          
     Neg Pred Value : 0.9426          
         Prevalence : 0.4265          
     Detection Rate : 0.3922          
Detection Prevalence : 0.4020          
Balanced Accuracy : 0.9512                                   
'Positive' Class : Malignant

Here the positive class is Malignant

like image 30
Moiz Ali Syed Avatar answered Sep 27 '22 22:09

Moiz Ali Syed