While using Caret package for machine learning, I am struck with Caret's default "Positive" outcome picking i.e the first level of the outcome factor in binary classification problems. Package says it can be set to the alternative level. Can any body help me to define the positive outcome? Thanking you

Changing the positive Class: One of the proficient way of doing this is through re-leveling of the target variable. For example: In the breast cancer Wisconsin dataset, the default level of Diagnosis is the basis of default Positive Class. The reference level of Diagnosis is: <pre class="prettyprint"><code>cancer<-read.csv("breast-cancer-wisconsin.csv") cancer$Diagnosis<-as.factor(cancer$Diagnosis) levels(cancer$Diagnosis) [1] "Benign" "Malignant" </code></pre> After performing the test-train split and model fit.The resultant confusion matrix and performance measures are: <pre class="prettyprint"><code>Confusion Matrix and Statistics predicted Actual Benign Malignant Benign 115 7 Malignant 2 80 Accuracy : 0.9559 95% CI : (0.9179, 0.9796) No Information Rate : 0.5735 P-Value [Acc > NIR] : <2e-16 Kappa : 0.9091 Mcnemar's Test P-Value : 0.1824 Sensitivity : 0.9829 Specificity : 0.9195 Pos Pred Value : 0.9426 Neg Pred Value : 0.9756 Prevalence : 0.5735 Detection Rate : 0.5637 Detection Prevalence: 0.5980 Balanced Accuracy : 0.9512 'Positive' Class : Benign </code></pre> It is to note that the **Positive Class is Benign" To change the Positive Class to "Malignant" can be done using the <code>relevel()</code> function. The <code>relevel()</code> changes the reference level of the variable. <pre class="prettyprint"><code>cancer$Diagnosis <- relevel(cancer$Diagnosis, ref = "Malignant") levels(cancer$Diagnosis) [1] "Malignant" "Benign" </code></pre> Again after performing the test-train split and model fitting, the confusion Matrix Performance Accuracy with changing of the reference is: <pre class="prettyprint"><code>Confusion Matrix and Statistics predicted Actual Malignant Benign Malignant 80 2 Benign 7 115 Accuracy : 0.9559 95% CI : (0.9179, 0.9796) No Information Rate : 0.5735 P-Value [Acc > NIR] : <2e-16 Kappa : 0.9091 Mcnemar's Test P-Value : 0.1824 Sensitivity : 0.9195 Specificity : 0.9829 Pos Pred Value : 0.9756 Neg Pred Value : 0.9426 Prevalence : 0.4265 Detection Rate : 0.3922 Detection Prevalence : 0.4020 Balanced Accuracy : 0.9512 'Positive' Class : Malignant </code></pre> Here the positive class is Malignant

Caret package - defining Positive result

2 Answers

look at this example. Extended this from the caret examples with confusionMatrix.

lvs <- c("normal", "abnormal")
truth <- factor(rep(lvs, times = c(86, 258)),
                levels = rev(lvs))
pred <- factor(
  c(
    rep(lvs, times = c(54, 32)),
    rep(lvs, times = c(27, 231))),               
  levels = rev(lvs))

xtab <- table(pred, truth)

str(truth)
Factor w/ 2 levels "abnormal","normal": 2 2 2 2 2 2 2 2 2 2 ...

Because abnormal is the first level, this will be the default positive class

confusionMatrix(xtab)

Confusion Matrix and Statistics

          truth
pred       abnormal normal
  abnormal      231     32
  normal         27     54

               Accuracy : 0.8285          
                 95% CI : (0.7844, 0.8668)
    No Information Rate : 0.75            
    P-Value [Acc > NIR] : 0.0003097       

                  Kappa : 0.5336          
 Mcnemar's Test P-Value : 0.6025370       

            Sensitivity : 0.8953          
            Specificity : 0.6279          
         Pos Pred Value : 0.8783          
         Neg Pred Value : 0.6667          
             Prevalence : 0.7500          
         Detection Rate : 0.6715          
   Detection Prevalence : 0.7645          
      Balanced Accuracy : 0.7616          

       'Positive' Class : abnormal

To change to positive class = normal, just add this in the confusionMatrix. Notice the differences with the previous output, differences start appearing at the sensitivity and other calculations.

confusionMatrix(xtab, positive = "normal")

Confusion Matrix and Statistics

          truth
pred       abnormal normal
  abnormal      231     32
  normal         27     54

               Accuracy : 0.8285          
                 95% CI : (0.7844, 0.8668)
    No Information Rate : 0.75            
    P-Value [Acc > NIR] : 0.0003097       

                  Kappa : 0.5336          
 Mcnemar's Test P-Value : 0.6025370       

            Sensitivity : 0.6279          
            Specificity : 0.8953          
         Pos Pred Value : 0.6667          
         Neg Pred Value : 0.8783          
             Prevalence : 0.2500          
         Detection Rate : 0.1570          
   Detection Prevalence : 0.2355          
      Balanced Accuracy : 0.7616          

       'Positive' Class : normal

169

answered Sep 27 '22 21:09

phiver

Changing the positive Class:

One of the proficient way of doing this is through re-leveling of the target variable.

For example: In the breast cancer Wisconsin dataset, the default level of Diagnosis is the basis of default Positive Class. The reference level of Diagnosis is:

cancer<-read.csv("breast-cancer-wisconsin.csv")
cancer$Diagnosis<-as.factor(cancer$Diagnosis)
levels(cancer$Diagnosis)
[1] "Benign"    "Malignant"

After performing the test-train split and model fit.The resultant confusion matrix and performance measures are:

Confusion Matrix and Statistics

predicted        Actual
             Benign Malignant
Benign       115         7
Malignant      2        80
                                      
           Accuracy : 0.9559          
             95% CI : (0.9179, 0.9796)
No Information Rate : 0.5735          
P-Value [Acc > NIR] : <2e-16                          
              Kappa : 0.9091                 
 Mcnemar's Test P-Value : 0.1824                  
        Sensitivity : 0.9829          
        Specificity : 0.9195          
     Pos Pred Value : 0.9426          
     Neg Pred Value : 0.9756          
         Prevalence : 0.5735          
     Detection Rate : 0.5637          
Detection Prevalence: 0.5980  
Balanced Accuracy   : 0.9512
'Positive' Class    : Benign

It is to note that the **Positive Class is Benign"

To change the Positive Class to "Malignant" can be done using the relevel() function. The relevel() changes the reference level of the variable.

cancer$Diagnosis <- relevel(cancer$Diagnosis, ref = "Malignant")
levels(cancer$Diagnosis)
[1] "Malignant" "Benign"

Again after performing the test-train split and model fitting, the confusion Matrix Performance Accuracy with changing of the reference is:

Confusion Matrix and Statistics

   predicted        Actual
               Malignant Benign
  Malignant        80      2
  Benign            7    115
                                      
           Accuracy : 0.9559          
             95% CI : (0.9179, 0.9796)
No Information Rate : 0.5735          
P-Value [Acc > NIR] : <2e-16                                   
          Kappa : 0.9091                               
 Mcnemar's Test P-Value : 0.1824                               
        Sensitivity : 0.9195          
        Specificity : 0.9829          
     Pos Pred Value : 0.9756          
     Neg Pred Value : 0.9426          
         Prevalence : 0.4265          
     Detection Rate : 0.3922          
Detection Prevalence : 0.4020          
Balanced Accuracy : 0.9512                                   
'Positive' Class : Malignant

Here the positive class is Malignant

answered Sep 27 '22 22:09

Moiz Ali Syed

Related questions
                            
                                Sum of subvectors of a vector in R
                            
                                Error in .jcall(cell, "V", "setCellValue", value) : method setCellValue with signature ([D)V not found when attempting write.xlsx
                            
                                can we iterate over two lists with purrr (not simultaneously)?
                            
                                How to add Rtools\bin to the system path in R
                            
                                R: using ranger with caret, tuneGrid argument
                            
                                Sum of values greater than or equal too for each element in grouped dataframe (dplyr) R
                            
                                Create sections through a loop with knitr
                            
                                Your experiences with Matlab/F#/R for data analysis and modeling algorithms
                            
                                How to unmask a function in R, due to name collisions on searchpath
                            
                                Why does mapply not return date-objects?
                            
                                predicting class for new data using neuralnet
                            
                                Formatting histogram x-axis when working with dates using R
                            
                                Convert dd/mm/yy and dd/mm/yyyy to Dates
                            
                                R Programming Error in cov.wt(z) : 'x' must contain finite values only
                            
                                Multiple Groups in geom_density() plot
                            
                                Linear Interpolation using dplyr
                            
                                spatial clustering in R (simple example)
                            
                                Access data.table columns with strings
                            
                                Change an integer into a specific string in a data.frame
                            
                                Multiple boxplots placed side by side for different column values in ggplot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Caret package - defining Positive result

Tags:

r

r-caret

duvvurum

People also ask

2 Answers

phiver

Moiz Ali Syed

Recent Activity

Donate For Us