I have a dataframe. first column contains my the predictive score (range from 0 to 100, smaller values is expected to be in class A, larger values is expected to be in class B) for my model, 2nd column contains the real classification of the entries (either "class A" or "class B").
How to get confusion matrix with R for different cut off values, as I cannot decide where I should define values < 20 or < 50 as class A yet?
How to do this comparison efficiently with R?
A confusion matrix in R is a table that will categorize the predictions against the actual values. It includes two dimensions, among them one will indicate the predicted values and another one will represent the actual values.
There's a number of ways to do this, a reproducible example with your data would have been desirable:
set.seed(12345)
test <- data.frame(pred=c(runif(50,0,75),runif(50,25,100)), group=c(rep("A",50), rep("B",50)) )
table(test$pred<50,test$group)
gives
A B
FALSE 18 34
TRUE 32 16
So this says 32 A's were under 50 and 34 B's were over 50, while 18 A's were over 50 (wrongly classified) and 16 B's were under 50 (wrongly classified)
set.seed(12345)
test <- data.frame(pred=c(runif(50,0,60),runif(50,40,100)), group=c(rep("A",50), rep("B",50)) )
table(test$pred<50,test$group)
gives
A B
FALSE 8 40
TRUE 42 10
In this example, cause of the chosen sampling, your classification is much better.
The '50' in this can then be changed to anything you want, 20, 30, etc.
table(test$pred<50,test$group)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With