I think I've implemented most of it correctly. One part confused me:
The zero-frequency problem: Add 1 to the count for every attribute value-class combination (Laplace estimator) when an attribute value doesn’t occur with every class value.
Here's some of my client code:
//Clasify
string text = "Claim your free Macbook now!";
double posteriorProbSpam = classifier.Classify(text, "spam");
Console.WriteLine("-------------------------");
double posteriorProbHam = classifier.Classify(text, "ham");
Now say the word 'free' is present in the training data somewhere
//Training
classifier.Train("ham", "Attention: Collect your Macbook from store.");
*Lot more here*
classifier.Train("spam", "Free macbook offer expiring.");
But the word is present in my training data for category 'spam' only not in 'ham'. So when I go to calculate posteriorProbHam what do i do when I come across the word 'free'.
Disadvantages of Naive Bayes If your test data set has a categorical variable of a category that wasn't present in the training data set, the Naive Bayes model will assign it zero probability and won't be able to make any predictions in this regard.
Naive Bayes assumes that all predictors (or features) are independent, rarely happening in real life. This limits the applicability of this algorithm in real-world use cases.
Working of Naïve Bayes' Classifier: So to solve this problem, we need to follow the below steps: Convert the given dataset into frequency tables. Generate Likelihood table by finding the probabilities of given features. Now, use Bayes theorem to calculate the posterior probability.
Laplace smoothing is a smoothing technique that helps tackle the problem of zero probability in the Naïve Bayes machine learning algorithm. Using higher alpha values will push the likelihood towards a value of 0.5, i.e., the probability of a word equal to 0.5 for both the positive and negative reviews.
Still add one. The reason: Naive Bayes models P("free" | spam)
and P("free" | ham)
as being completely independent, so you want to estimate the probability of each completely independently. The Laplace estimator you're using for P("free" | spam)
is (count("free" | spam) + 1) / count(spam)
; P("ham" | spam)
is the same.
If you think about what it would mean to not add one, it wouldn't really make sense: seeing "free" one time in ham would make it less likely to see "free" in spam.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With