Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logistic regression confusion matrix problem

I tried computing confusion-matrix for my glm model but I keep getting:

Error: data and reference should be factors with the same levels.

Below is my model:

model3 <- glm(winner ~ srs.1 + srs.2, data = train_set, family = binomial)
confusionMatrix(table(predict(model3, newdata=test_set, type="response")) >= 0.5,
                      train_set$winner == 1)

winner variable contains team1 and team2.
srs.1 and srs.2 are numerical values.

What is my problem here?

like image 620
John Sean Avatar asked Nov 27 '25 23:11

John Sean


1 Answers

I suppose your winner label is a binary of 0,1. So let's use the example below:

library(caret)
set.seed(111)
data = data.frame(
srs.1 = rnorm(200),
srs.2 = rnorm(200)
)

data$winner = ifelse(data$srs.1*data$srs.2 > 0,1,0)

idx = sample(nrow(data),150)
train_set = data[idx,]
test_set = data[-idx,]

model3 <- glm(winner ~ srs.1 + srs.2, data = train_set, family = binomial)

Like you did, we try to predict, if > 0.5, it will be 1 else 0. You got the table() about right. Note you need to do it both for test_set, or train_set:

pred = as.numeric(predict(model3, newdata=test_set, type="response")>0.5)
ref = test_set$winner

confusionMatrix(table(pred,ref))

Confusion Matrix and Statistics

    ref
pred  0  1
   0 12  5
   1 19 14

               Accuracy : 0.52            
                 95% CI : (0.3742, 0.6634)
    No Information Rate : 0.62            
    P-Value [Acc > NIR] : 0.943973        

                  Kappa : 0.1085  
like image 167
StupidWolf Avatar answered Nov 29 '25 15:11

StupidWolf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!