R package caret confusionMatrix with missing categories

Tags:

I am using the function confusionMatrix in the R package caret to calculate some statistics for some data I have. I have been putting my predictions as well as my actual values into the table function to get the table to be used in the confusionMatrix function as so:

table(predicted,actual)

However, there are multiple possible outcomes (e.g. A, B, C, D), and my predictions do not always represent all the possibilities (e.g. only A, B, D). The resulting output of the table function does not include the missing outcome and looks like this:

    A    B    C    D
A  n1   n2   n2   n4  
B  n5   n6   n7   n8  
D  n9  n10  n11  n12
# Note how there is no corresponding row for `C`.

The confusionMatrix function can't handle the missing outcome and gives the error:

Error in !all.equal(nrow(data), ncol(data)) : invalid argument type

Is there a way I can use the table function differently to get the missing rows with zeros or use the confusionMatrix function differently so it will view missing outcomes as zero?

As a note: Since I am randomly selecting my data to test with, there are times that a category is also not represented in the actual result as opposed to just the predicted. I don't believe this will change the solution.

454

asked Nov 09 '13 00:11

Barker

2 Answers

You can use union to ensure similar levels:

library(caret)

# Sample Data
predicted <- c(1,2,1,2,1,2,1,2,3,4,3,4,6,5) # Levels 1,2,3,4,5,6
reference <- c(1,2,1,2,1,2,1,2,1,2,1,3,3,4) # Levels 1,2,3,4

u <- union(predicted, reference)
t <- table(factor(predicted, u), factor(reference, u))
confusionMatrix(t)

138

answered Dec 14 '22 23:12

Borealis

First note that confusionMatrix can be called as confusionMatrix(predicted, actual) in addition to being called with table objects. However, the function throws an error if predicted and actual (both regarded as factors) do not have the same number of levels.

This (and the fact that the caret package spit an error on me because they don't get the dependencies right in the first place) is why I'd suggest to create your own function:

# Create a confusion matrix from the given outcomes, whose rows correspond
# to the actual and the columns to the predicated classes.
createConfusionMatrix <- function(act, pred) {
  # You've mentioned that neither actual nor predicted may give a complete
  # picture of the available classes, hence:
  numClasses <- max(act, pred)
  # Sort predicted and actual as it simplifies what's next. You can make this
  # faster by storing `order(act)` in a temporary variable.
  pred <- pred[order(act)]
  act  <- act[order(act)]
  sapply(split(pred, act), tabulate, nbins=numClasses)
}

# Generate random data since you've not provided an actual example.
actual    <- sample(1:4, 1000, replace=TRUE)
predicted <- sample(c(1L,2L,4L), 1000, replace=TRUE)

print( createConfusionMatrix(actual, predicted) )

which will give you:

      1  2  3  4
[1,] 85 87 90 77
[2,] 78 78 79 95
[3,]  0  0  0  0
[4,] 89 77 82 83

answered Dec 14 '22 22:12

fotNelton

Related questions
                            
                                Vector input in shiny R and then use it
                            
                                Hyper-parameter tuning using pure ranger package in R
                            
                                Failed to connect the database when using sqldf in r
                            
                                Uniroot solution in R
                            
                                Round down a numeric
                            
                                How to save output from ggforce::facet_grid_paginate in only one pdf?
                            
                                Find all combinations of a set of numbers that add up to a certain total
                            
                                Euclidean distance calculations in R not making sense
                            
                                Convert string to date, format: "dd.mm.yyyy"
                            
                                count unique combinations of values
                            
                                Split on first comma in string
                            
                                How to find highest value in a data frame?
                            
                                R rbind error row.names duplicates not allowed
                            
                                R- delete accents in string
                            
                                Negation `!` in a dplyr pipeline `%>%`
                            
                                How to create lag variables
                            
                                How expand ggplot bar scale on one side but not the other without manual limits
                            
                                Error in dev.off() : cannot shut down device 1 (the null device)
                            
                                Avoid two for loops in R
                            
                                changing default environment for assignment of new variables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R package caret confusionMatrix with missing categories

Tags:

r

missing-data

r-caret

confusion-matrix

Barker

People also ask

2 Answers

Borealis

fotNelton

Recent Activity

Donate For Us