Stupid Backoff implementation clarification

Tags:

Hello people I'm implementing the Stupid Backoff (page 2, equation 5) smoothing technique for a project I'm working on and I have a question on its implementation. This is a smoothing algorithm used in NLP, Good-Turing is I guess the most well known similar algorithm.

A brief description of the algorithm is: When trying to find the probability of word appearing in a sentence it will first look for context for the word at the n-gram level and if there is no n-gram of that size it will recurse to the (n-1)-gram and multiply its score with 0.4. The recursion stops at unigrams.

So if I want to find the probability of "day" in the context of "a sunny day" it would first look to see if the tri-gram "a sunny day" exists in the corpus, if not it would try the same with the bigram "sunny day" and finally it would just get the frequency for "day" divided by the corpus size (total number of words in the training data).

My question is: Do I multiply the score with 0.4 every time I reduce the size of the n-gram?

So in the above example if we are not able to find a tri-gram or bi-gram the final score would be:

0.4 * 0.4 * frequency(day) / corpus_size?

or do I just multiply once at the final level so regardless of how many backoffs I have to make I just multiply the final score with 0.4?

491

asked May 05 '13 09:05

Bar

1 Answers

Basically I read equation 5 as you describe in your math above.

So for "a sunny day" where no instance was observed, you would calculate S("day" | "a sunny"). Not finding the trigram "a sunny day" you would take case two in equation 5, and estimate S("day" | "a sunny") as alpha * S("day" | "sunny").

If again, you recorded no observances of "sunny day" you would approximate S("day" | "sunny") as alpha * S("day"), which is the terminal case f("day") / N (the number of observed unigrams).

By setting alpha to 0.4 you get exactly what you wrote out above.

Hope this helps.

-bms20

122

answered Jun 14 '23 23:06

bms20

Related questions
                            
                                How to load sentences into Python gensim?
                            
                                Fast/Optimize N-gram implementations in python
                            
                                How does word2vec or skip-gram model convert words to vector?
                            
                                php sentence boundaries detection [duplicate]
                            
                                Stanford Core NLP - understanding coreference resolution
                            
                                Is wordnet path similarity commutative?
                            
                                NLP/Machine Learning text comparison [closed]
                            
                                What does a weighted word embedding mean?
                            
                                nltk language model (ngram) calculate the prob of a word from context
                            
                                Saving nltk drawn parse tree to image file
                            
                                What are co-occurence matrixes and how are they used in NLP?
                            
                                How to add attention layer to a Bi-LSTM
                            
                                How can I install torchtext?
                            
                                Why does word2Vec use cosine similarity?
                            
                                Speed up Spacy Named Entity Recognition
                            
                                Sentiment analysis with NLTK python for sentences using sample data or webservice?
                            
                                Gensim train word2vec on wikipedia - preprocessing and parameters
                            
                                Problems obtaining most informative features with scikit learn?
                            
                                TF*IDF for Search Queries
                            
                                Using the Python NLTK (2.0b5) on the Google App Engine

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Stupid Backoff implementation clarification

Tags:

nlp

smoothing

Bar

People also ask

1 Answers

bms20

Recent Activity

Donate For Us