Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read the classifier confusion matrix in WEKA

Sorry, I am new to WEKA and just learning.

In my decision tree (J48) classifier output, there is a confusion Matrix:

a    b   <----- classified as
130  8     a = functional
15   150   b = non-functional
  • How do I read this matrix? What's the difference between a & b?
  • Also, can anyone explain to me what domain values are?
like image 937
JakeSays Avatar asked Mar 05 '13 01:03

JakeSays


People also ask

How do you read a 4x4 confusion matrix?

In your case understand that the 4*4 matrix denotes that you have 4 different values in your predicted variable, namely:AGN,BeXRB,HMXB,SNR. One thing more, the correct classification of the values will be on the diagonal running from top-left to bottom-right and all the other values are misclassified.

How do you calculate classification accuracy in Weka?

You can see the correctly classified instances reported in the summary part (a little bit above the part it's reporting the accuracy by class). in front of this part you can see a number (which indicates the number of instances) and a percentage (which is the accuracy).


2 Answers

Have you read the wikipedia page on confusion matrices? The text around the matrix is arranged slightly differently in their example (row labels on the left instead of on the right), but you read it just the same.

The row indicates the true class, the column indicates the classifier output. Each entry, then, gives the number of instances of <row> that were classified as <column>. In your example, 15 Bs were (incorrectly) classified as As, 150 Bs were correctly classified as Bs, etc.

As a result, all correct classifications are on the top-left to bottom-right diagonal. Everything off that diagonal is an incorrect classification of some sort.

Edit: The Wikipedia page has since switched the rows and columns around. This happens. When studying a confusion matrix, always make sure to check the labels to see whether it's true classes in rows, predicted class in columns or the other way around.

like image 104
Junuxx Avatar answered Oct 11 '22 02:10

Junuxx


I'd put it this way:

The confusion matrix is Weka reporting on how good this J48 model is in terms of what it gets right, and what it gets wrong.

In your data, the target variable was either "functional" or "non-functional;" the right side of the matrix tells you that column "a" is functional, and "b" is non-functional.

The columns tell you how your model classified your samples - it's what the model predicted:

  • The first column contains all the samples which your model thinks are "a" - 145 of them, total
  • The second column contains all the samples which your model thinks are "b" - 158 of them

The rows, on the other hand, represent reality:

  • The first row contains all the samples which really are "a" - 138 of them, total
  • The second row contains all the samples which really are "b" - 165 of them

Knowing the columns and rows, you can dig into the details:

  • Top left, 130, are things your model thinks are "a" which really are "a" <- these were correct
  • Bottom left, 15, are things your model thinks are "a" but which are really "b" <- one kind of error
  • Top right, 8, are things your model thinks are "b" but which really are "a" <- another kind of error
  • Bottom right, 150 are things your model thinks are "b" which really are "b"

So top-left and bottom-right of the matrix are showing things your model gets right.

Bottom-left and top-right of the matrix are are showing where your model is confused.

like image 43
Mental Nomad Avatar answered Oct 11 '22 02:10

Mental Nomad