I've trained a tree model with R caret. I'm now trying to generate a confusion matrix and keep getting the following error:
Error in confusionMatrix.default(predictionsTree, testdata$catgeory) : the data and reference factors must have the same number of levels
prob <- 0.5 #Specify class split
singleSplit <- createDataPartition(modellingData2$category, p=prob,
times=1, list=FALSE)
cvControl <- trainControl(method="repeatedcv", number=10, repeats=5)
traindata <- modellingData2[singleSplit,]
testdata <- modellingData2[-singleSplit,]
treeFit <- train(traindata$category~., data=traindata,
trControl=cvControl, method="rpart", tuneLength=10)
predictionsTree <- predict(treeFit, testdata)
confusionMatrix(predictionsTree, testdata$catgeory)
The error occurs when generating the confusion matrix. The levels are the same on both objects. I cant figure out what the problem is. Their structure and levels are given below. They should be the same. Any help would be greatly appreciated as its making me cracked!!
> str(predictionsTree)
Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ...
> str(testdata$category)
Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ...
> levels(predictionsTree)
[1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee" "18-Gov. Stamp Duty" "Misc" "26-Standard Transfer Charge"
[6] "29-Bank Giro Credit" "3-Cheques Debit" "32-Standing Order - Debit" "33-Inter Branch Payment" "34-International"
[11] "35-Point of Sale" "39-Direct Debits Received" "4-Notified Bank Fees" "40-Cash Lodged" "42-International Receipts"
[16] "46-Direct Debits Paid" "56-Credit Card Receipts" "57-Inter Branch" "58-Unpaid Items" "59-Inter Company Transfers"
[21] "6-Notified Interest Credited" "61-Domestic" "64-Charge Refund" "66-Inter Company Transfers" "67-Suppliers"
[26] "68-Payroll" "69-Domestic" "73-Credit Card Payments" "82-CHAPS Fee" "Uncategorised"
> levels(testdata$category)
[1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee" "18-Gov. Stamp Duty" "Misc" "26-Standard Transfer Charge"
[6] "29-Bank Giro Credit" "3-Cheques Debit" "32-Standing Order - Debit" "33-Inter Branch Payment" "34-International"
[11] "35-Point of Sale" "39-Direct Debits Received" "4-Notified Bank Fees" "40-Cash Lodged" "42-International Receipts"
[16] "46-Direct Debits Paid" "56-Credit Card Receipts" "57-Inter Branch" "58-Unpaid Items" "59-Inter Company Transfers"
[21] "6-Notified Interest Credited" "61-Domestic" "64-Charge Refund" "66-Inter Company Transfers" "67-Suppliers"
[26] "68-Payroll" "69-Domestic" "73-Credit Card Payments" "82-CHAPS Fee" "Uncategorised"
Error: data and reference should be factors with the same levels. means that you need to give it factors as inputs ( train[[predict]] > c is not a factor). Try using factor(ifelse(...), levels) instead).
3.1 Confusion matrix Confusion matrix visualization. True positive (TP): Observation is predicted positive and is actually positive. False positive (FP): Observation is predicted positive and is actually negative. True negative (TN): Observation is predicted negative and is actually negative.
Use the autoplot Function to Visualize Confusion Matrix in R In this case, we construct the matrix with the conf_mat function that produces an object of the conf_mat class that can be directly passed as the first argument to the autoplot function.
Confusion Matrix Levels Error with non unique test data Related 29 Error in Confusion Matrix : the data and reference factors must have the same number of levels 5 Error: `data` and `reference` should be factors with the same levels 2 Error: `data` and `reference` should be factors with the same levels. Using confusionMatrix (caret) 0
Whenever you face an error: `data` and `reference` should be factors with the same levels, make sure that both the true values and the prediction values are of “factor” data-type. Here both pred and testing$Final must be of datatype factor. Here testing$Final is of type int, convert it to factor and then build the confusion matrix.
Whenever you try to build a confusion matrix, make sure that both the true values and prediction values are of factor datatype. Here both pred and testing$Final must be of type factor. Instead of check for levels, check the type of both the variables and convert them to factor if they are not.
Then,if your model fit is predicting some incorrect level,then it is better to use tables confusionMatrix(table(Arg1, Arg2)) Share Improve this answer Follow answered Jul 17, 2019 at 13:03 Sanjay NandakumarSanjay Nandakumar 32311 silver badge1010 bronze badges Add a comment | 0
Try use:
confusionMatrix(table(Argument 1, Argument 2))
Thats worked for me.
Maybe your model is not predicting a certain factor.
Use the table()
function instead of confusionMatrix()
to see if that is the problem.
Try specifying na.pass
for the na.action
option:
predictionsTree <- predict(treeFit, testdata,na.action = na.pass)
Change them into a data frame and then use them in confusionMatrix function:
pridicted <- factor(predict(treeFit, testdata))
real <- factor(testdata$catgeory)
my_data1 <- data.frame(data = pridicted, type = "prediction")
my_data2 <- data.frame(data = real, type = "real")
my_data3 <- rbind(my_data1,my_data2)
# Check if the levels are identical
identical(levels(my_data3[my_data3$type == "prediction",1]) , levels(my_data3[my_data3$type == "real",1]))
confusionMatrix(my_data3[my_data3$type == "prediction",1], my_data3[my_data3$type == "real",1], dnn = c("Prediction", "Reference"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With