Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

definition of error rate in classification and why some researchers use error rate instead of accuracy

what is the exact definition of error rate in classification? why some researchers use error rate to report their results instead of accuracy? I'm trying to compare my results for text classification with other methods in the literature, but they used error rate instead of accuracy and I can't find the exact definition/equation to find error rate of my method.

like image 315
parvaneh shayegh Avatar asked Oct 18 '18 00:10

parvaneh shayegh


1 Answers

For classification, your output is discrete (as if you were putting items into buckets) so accuracy has a really straightforward definition:

accuracy = (# classified correct) / (# classified total)

Error rate is equally simple:

error rate = 1 - accuracy = 1 - (# classified correct) / (# classified total)

= (# classified incorrect) / (# classified total)

Note that things are much more complicated for tasks with continuous output. If instead of placing items into buckets, I'm asking a model to place items on a number line, accuracy is no longer a matter of "right" and "wrong" but rather how close my model is to right. This could be the average closeness, median closeness, etc. There are more complex measures which differ mainly in how heavily they weigh distance as it increases. Perhaps being off by a little bit is much much less bad than being off by a lot, so a Root Mean Square error measure is appropriate. On the other hand, it may be that being off by more than a small amount is awful whether it's off by a little or off by a lot, so a logarithmic error measure would be better.


To answer the last part of your question: in the discrete case, why would one choose accuracy vs. error? Optics is one thing: "99% accurate" sends a different psychological message than "has an error rate of 1%". Furthermore, an increase in accuracy from 99% to 99.9% is increasing accuracy by 1%, but a decrease in error from 1% to .1% is a decrease in error of 90%, even though the two express the same real-world change.

Otherwise, it may be personal preference or writing style.

EDIT: you may also be interested in this post on the Statistics Stack Exchange

like image 60
MyStackRunnethOver Avatar answered Sep 21 '22 15:09

MyStackRunnethOver