Sorry, I am new to WEKA and just learning. In my decision tree (J48) classifier output, there is a confusion Matrix: <pre class="prettyprint"><code>a b <----- classified as 130 8 a = functional 15 150 b = non-functional </code></pre> <ul> <li>How do I read this matrix? What's the difference between a & b?</li> <li>Also, can anyone explain to me what domain values are?</li> </ul>

I'd put it this way: The confusion matrix is Weka reporting on how good this J48 model is in terms of what it gets right, and what it gets wrong. In your data, the target variable was either "functional" or "non-functional;" the right side of the matrix tells you that column "a" is functional, and "b" is non-functional. The columns tell you how your model classified your samples - it's what the model predicted: <ul> <li>The first column contains all the samples which your model thinks are "a" - 145 of them, total</li> <li>The second column contains all the samples which your model thinks are "b" - 158 of them</li> </ul> The rows, on the other hand, represent reality: <ul> <li>The first row contains all the samples which really are "a" - 138 of them, total</li> <li>The second row contains all the samples which really are "b" - 165 of them</li> </ul> Knowing the columns and rows, you can dig into the details: <ul> <li>Top left, 130, are things your model thinks are "a" which really are "a" <- these were correct </li> <li>Bottom left, 15, are things your model thinks are "a" but which are really "b" <- one kind of error </li> <li>Top right, 8, are things your model thinks are "b" but which really are "a" <- another kind of error </li> <li>Bottom right, 150 are things your model thinks are "b" which really are "b"</li> </ul> So top-left and bottom-right of the matrix are showing things your model gets right. Bottom-left and top-right of the matrix are are showing where your model is confused.

How to read the classifier confusion matrix in WEKA

Tags:

classification

decision-tree

weka

Sorry, I am new to WEKA and just learning.

In my decision tree (J48) classifier output, there is a confusion Matrix:

a    b   <----- classified as
130  8     a = functional
15   150   b = non-functional

How do I read this matrix? What's the difference between a & b?
Also, can anyone explain to me what domain values are?

937

asked Mar 05 '13 01:03

JakeSays

2 Answers

Have you read the wikipedia page on confusion matrices? The text around the matrix is arranged slightly differently in their example (row labels on the left instead of on the right), but you read it just the same.

The row indicates the true class, the column indicates the classifier output. Each entry, then, gives the number of instances of <row> that were classified as <column>. In your example, 15 Bs were (incorrectly) classified as As, 150 Bs were correctly classified as Bs, etc.

As a result, all correct classifications are on the top-left to bottom-right diagonal. Everything off that diagonal is an incorrect classification of some sort.

Edit: The Wikipedia page has since switched the rows and columns around. This happens. When studying a confusion matrix, always make sure to check the labels to see whether it's true classes in rows, predicted class in columns or the other way around.

104

answered Oct 11 '22 02:10

Junuxx

I'd put it this way:

The confusion matrix is Weka reporting on how good this J48 model is in terms of what it gets right, and what it gets wrong.

In your data, the target variable was either "functional" or "non-functional;" the right side of the matrix tells you that column "a" is functional, and "b" is non-functional.

The columns tell you how your model classified your samples - it's what the model predicted:

The first column contains all the samples which your model thinks are "a" - 145 of them, total
The second column contains all the samples which your model thinks are "b" - 158 of them

The rows, on the other hand, represent reality:

The first row contains all the samples which really are "a" - 138 of them, total
The second row contains all the samples which really are "b" - 165 of them

Knowing the columns and rows, you can dig into the details:

Top left, 130, are things your model thinks are "a" which really are "a" <- these were correct
Bottom left, 15, are things your model thinks are "a" but which are really "b" <- one kind of error
Top right, 8, are things your model thinks are "b" but which really are "a" <- another kind of error
Bottom right, 150 are things your model thinks are "b" which really are "b"

So top-left and bottom-right of the matrix are showing things your model gets right.

Bottom-left and top-right of the matrix are are showing where your model is confused.

answered Oct 11 '22 02:10

Mental Nomad

Related questions
                            
                                Extract tf-idf vectors with lucene
                            
                                Learning and using augmented Bayes classifiers in python
                            
                                Beginner's resources/introductions to classification algorithms [closed]
                            
                                How to use or abuse artifact classifiers in maven?
                            
                                Probability and Neural Networks
                            
                                Why use softmax only in the output layer and not in hidden layers?
                            
                                Binary classification with Softmax
                            
                                Learning Weka on the Command Line
                            
                                How to approach machine learning problems with high dimensional input space?
                            
                                What's the best open-source Java Bayesian spam filter library? [closed]
                            
                                How to implement pixel-wise classification for scene labeling in TensorFlow?
                            
                                Sentiment analysis with NLTK python for sentences using sample data or webservice?
                            
                                Difference between Dense(2) and Dense(1) as the final layer of a binary classification CNN?
                            
                                Class wise precision and recall for multi class classification in Tensorflow?
                            
                                Monitor training/validation process in Caffe
                            
                                Dealing with the class imbalance in binary classification
                            
                                Retraining after Cross Validation with libsvm
                            
                                Cost function in logistic regression gives NaN as a result
                            
                                What is the difference between classification and prediction?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With