How do we interpret the cost matrix in WEKA? If I have 2 classes to predict (class 0 and class 1) and want to penalize classfication of class 0 as class 1 more (say double the penalty), what exactly is the matrix format?
Is it :
0 10
20 0
or is it
0 20
10 0
The source of confusion are the following two references:
1) The JavaDoc for Weka CostMatrix says:
The element at position i,j in the matrix is the penalty for classifying an instance of class j as class i.
2) However, the answer in this post seems to indicate otherwise.
http://weka.8497.n7.nabble.com/cost-matrix-td5821.html
Given the first cost matrix, the post says "Misclassifying an instance of class 0 incurs a cost of 10. Misclassifying an instance of class 1 is twice as costly.
Thanks.
I know my answer is coming very late, but it might help somebody so here it is:
To boost the cost of classifying an item of class 0 as class 1, the correct format is the second one.
The evidence:
Cost Matrix I used:
0 1.0
1000.0 0
Confusion matrix (from cross-validation):
a b <-- classified as
565 20 | a = ignored
54 204 | b = not_ignored
Cross-validation output:
...
Total Cost 54020
...
That's a cost of 54 * 10000 + 20 * 1
, which matches the confusion matrix above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With