Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error in ConfusionMatrix the data and reference factors must have the same number of levels

I've trained a tree model with R caret. I'm now trying to generate a confusion matrix and keep getting the following error:

Error in confusionMatrix.default(predictionsTree, testdata$catgeory) : the data and reference factors must have the same number of levels

prob <- 0.5 #Specify class split
singleSplit <- createDataPartition(modellingData2$category, p=prob,
                                   times=1, list=FALSE)
cvControl <- trainControl(method="repeatedcv", number=10, repeats=5)
traindata <- modellingData2[singleSplit,]
testdata <- modellingData2[-singleSplit,]
treeFit <- train(traindata$category~., data=traindata,
                 trControl=cvControl, method="rpart", tuneLength=10)
predictionsTree <- predict(treeFit, testdata)
confusionMatrix(predictionsTree, testdata$catgeory)

The error occurs when generating the confusion matrix. The levels are the same on both objects. I cant figure out what the problem is. Their structure and levels are given below. They should be the same. Any help would be greatly appreciated as its making me cracked!!

> str(predictionsTree)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ...
> str(testdata$category)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ...

> levels(predictionsTree)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"   

> levels(testdata$category)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"       
like image 617
user2987739 Avatar asked Jul 17 '14 10:07

user2987739


People also ask

What is error data and reference should be factors with the same levels?

Error: data and reference should be factors with the same levels. means that you need to give it factors as inputs ( train[[predict]] > c is not a factor). Try using factor(ifelse(...), levels) instead).

What is false positive in confusion matrix?

3.1 Confusion matrix Confusion matrix visualization. True positive (TP): Observation is predicted positive and is actually positive. False positive (FP): Observation is predicted positive and is actually negative. True negative (TN): Observation is predicted negative and is actually negative.

How do you visualize a confusion matrix in R?

Use the autoplot Function to Visualize Confusion Matrix in R In this case, we construct the matrix with the conf_mat function that produces an object of the conf_mat class that can be directly passed as the first argument to the autoplot function.

What are the error levels in confusion matrix?

Confusion Matrix Levels Error with non unique test data Related 29 Error in Confusion Matrix : the data and reference factors must have the same number of levels 5 Error: `data` and `reference` should be factors with the same levels 2 Error: `data` and `reference` should be factors with the same levels. Using confusionMatrix (caret) 0

How to solve the error “data` and “reference” should be factors?

Whenever you face an error: `data` and `reference` should be factors with the same levels, make sure that both the true values and the prediction values are of “factor” data-type. Here both pred and testing$Final must be of datatype factor. Here testing$Final is of type int, convert it to factor and then build the confusion matrix.

How do you make a confusion matrix?

Whenever you try to build a confusion matrix, make sure that both the true values and prediction values are of factor datatype. Here both pred and testing$Final must be of type factor. Instead of check for levels, check the type of both the variables and convert them to factor if they are not.

Is it better to use confusion matrix or tables?

Then,if your model fit is predicting some incorrect level,then it is better to use tables confusionMatrix(table(Arg1, Arg2)) Share Improve this answer Follow answered Jul 17, 2019 at 13:03 Sanjay NandakumarSanjay Nandakumar 32311 silver badge1010 bronze badges Add a comment | 0


4 Answers

Try use:

confusionMatrix(table(Argument 1, Argument 2)) 

Thats worked for me.

like image 104
Mayk Tulio Avatar answered Oct 30 '22 23:10

Mayk Tulio


Maybe your model is not predicting a certain factor. Use the table() function instead of confusionMatrix() to see if that is the problem.

like image 38
Red Avatar answered Oct 31 '22 00:10

Red


Try specifying na.pass for the na.action option:

predictionsTree <- predict(treeFit, testdata,na.action = na.pass)
like image 30
aristotll Avatar answered Oct 31 '22 00:10

aristotll


Change them into a data frame and then use them in confusionMatrix function:

pridicted <- factor(predict(treeFit, testdata))
real <- factor(testdata$catgeory)

my_data1 <- data.frame(data = pridicted, type = "prediction")
my_data2 <- data.frame(data = real, type = "real")
my_data3 <- rbind(my_data1,my_data2)

# Check if the levels are identical
identical(levels(my_data3[my_data3$type == "prediction",1]) , levels(my_data3[my_data3$type == "real",1]))

confusionMatrix(my_data3[my_data3$type == "prediction",1], my_data3[my_data3$type == "real",1],  dnn = c("Prediction", "Reference"))
like image 36
S. Think Avatar answered Oct 30 '22 23:10

S. Think