I think I've implemented most of it correctly. One part confused me: The zero-frequency problem: Add 1 to the count for every attribute value-class combination (Laplace estimator) when an attribute value doesn’t occur with every class value. Here's some of my client code: <pre class="prettyprint"><code>//Clasify string text = "Claim your free Macbook now!"; double posteriorProbSpam = classifier.Classify(text, "spam"); Console.WriteLine("-------------------------"); double posteriorProbHam = classifier.Classify(text, "ham"); </code></pre> Now say the word 'free' is present in the training data somewhere <pre class="prettyprint"><code>//Training classifier.Train("ham", "Attention: Collect your Macbook from store."); *Lot more here* classifier.Train("spam", "Free macbook offer expiring."); </code></pre> But the word is present in my training data for category 'spam' only not in 'ham'. So when I go to calculate posteriorProbHam what do i do when I come across the word 'free'. <img src="https://i.stack.imgur.com/fhHTk.png" alt="enter image description here">

Still add one. The reason: Naive Bayes models <code>P("free" | spam)</code> and <code>P("free" | ham)</code> as being completely independent, so you want to estimate the probability of each completely independently. The Laplace estimator you're using for <code>P("free" | spam)</code> is <code>(count("free" | spam) + 1) / count(spam)</code>; <code>P("ham" | spam)</code> is the same. If you think about what it would mean to not add one, it wouldn't really make sense: seeing "free" one time in ham would make it less likely to see "free" in spam.

Naive Bayesian and zero-frequency issue

Tags:

algorithm

machine-learning

spam-prevention

bayesian

I think I've implemented most of it correctly. One part confused me:

The zero-frequency problem: Add 1 to the count for every attribute value-class combination (Laplace estimator) when an attribute value doesn’t occur with every class value.

Here's some of my client code:

//Clasify
string text = "Claim your free Macbook now!";
double posteriorProbSpam = classifier.Classify(text, "spam");
Console.WriteLine("-------------------------");
double posteriorProbHam = classifier.Classify(text, "ham");

Now say the word 'free' is present in the training data somewhere

//Training
classifier.Train("ham", "Attention: Collect your Macbook from store.");
*Lot more here*
classifier.Train("spam", "Free macbook offer expiring.");

But the word is present in my training data for category 'spam' only not in 'ham'. So when I go to calculate posteriorProbHam what do i do when I come across the word 'free'.

enter image description here

409

asked Aug 25 '12 09:08

Science_Fiction

1 Answers

Still add one. The reason: Naive Bayes models P("free" | spam) and P("free" | ham) as being completely independent, so you want to estimate the probability of each completely independently. The Laplace estimator you're using for P("free" | spam) is (count("free" | spam) + 1) / count(spam); P("ham" | spam) is the same.

If you think about what it would mean to not add one, it wouldn't really make sense: seeing "free" one time in ham would make it less likely to see "free" in spam.

157

answered Sep 18 '22 00:09

Danica

Related questions
                            
                                How do you generate a random number between [1, n] using a random generator that generates a floating point number between [0.0, 1.0)
                            
                                Cython vector operations
                            
                                Finding smallest set of criteria for uniqueness
                            
                                Rules matching given an input (algorithm)
                            
                                Flattening nested loops / decreasing complexity - complementary pairs counting algorithm
                            
                                algorithm to trace border in 2D array
                            
                                Dynamic Programming algorithms and real world usage
                            
                                Time efficient implementation of generating probability tree and then sorting the results
                            
                                Writing bucket sort in c++
                            
                                How can I improve this algorithm to prevent TLE is SPOJ submission?
                            
                                graph - How to avoid reprocessing same edge twice in Depth First Search?
                            
                                OpenCV Sum of squared differences speed
                            
                                Fast factorization
                            
                                Segmented Sieve of Atkin, possible?
                            
                                Find if an item already exists in STL queue
                            
                                What is a simple garbage collection algorithm for experimenting with a simple interpreter?
                            
                                OPTICS Clustering algorithm. How to get the best epsilon
                            
                                A measure of similarity between two lists
                            
                                How google recognises 2 words without spaces?
                            
                                F# transform list to a tree

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With