I'm wondering if a Bayes classifier makes sense for an application where the same phrase "served cold" (for example) is "good" when associated some things (beer, soda) but "bad" when related to other things (steak, pizza, burger)?
What I'm wondering is if training a Bayes classifier that ("beer cold" and "soda cold" are "good") cancels out training it that "steak served cold" and "burger served cold" are "bad").
Or, can Bayes (correctly) be trained that "served cold" might be "good" or "bad" depending on what it is associated with?
I found a lot of good info on Bayes, here and elsewhere, but was unable to determine if it's suitable for this type of application where the answer to a phrase being good or bad is "it depends"?
It is considered the ideal case in which the probability structure underlying the categories is known perfectly. Why is that with Bayes classifier we achieve the best performance that can be achieved ?
It can be shown that of all classifiers, the Optimal Bayes classifier is the one that will have the lowest probability of miss classifying an observation, i.e. the lowest probability of error. So if we know the posterior distribution, then using the Bayes classifier is as good as it gets.
Naive Bayes classifier calculates the probability of an event in the following steps: Step 1: Calculate the prior probability for given class labels. Step 2: Find Likelihood probability with each attribute for each class. Step 3: Put these value in Bayes Formula and calculate posterior probability.
In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter.
A Naive Bayes classifier assumes independence between attributes. For example, assume you have the following data:
apple fruit red BAD
apple fruit green BAD
banana fruit yellow GOOD
tomato vegetable red GOOD
Independence means that the attributes (name, fruit, color) are independent; for example, that "apple" could be either "fruit" or "vegetable". In this case the attributes "name" and "fruit" are dependent so a Naive Bayes classifier is too naive (it would likely classify "apple fruit yellow" as BAD because it's an apple AND it's a fruit -- but aren't all apples fruits?).
To answer your original question, a Naive Bayes classifer assumes that class (GOOD or BAD) depends upon each attribute independently, which isn't the case -- I like my pizza hot and my soda cold.
EDIT: If you're looking for a classifier that has some utility but in theory could have numerous Type I and Type II errors, Naive Bayes is such a classifier. Naive Bayes is better than nothing, but there's measurable value in using a less naive classifier.
I wouldn't dismiss Bayes as fast as Daniel suggested. The quality (performance in math-speak) of Bayes depends on amount and quality of training data above all, and on the assumptions you make when you develop your algorithm.
To give you a short example, if you feed into it only {'beer cold' => :good, 'pizza cold' => :bad} the word 'cold' won't actually affect classification. It will just decide that all beers are good and all pizzas are bad (see how smart it is? :))
Anyway, the answer is too short to explain this in detail, I would recommend reading Paul Graham's essay on how he developed his spam filter - note that he made his own algorithm based on Bayes and not just off-the-shelf classifier. In my (so far short) experience it seems that you are better off following him in developing specific version of algorithm for specific problem at hand so you have control over various domain specific assumptions.
You can follow my attempts (in ruby) here if you are interested: http://arubyguy.com/2011/03/03/bayes-classification-update/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With