Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stupid Backoff implementation clarification

Tags:

nlp

smoothing

Hello people I'm implementing the Stupid Backoff (page 2, equation 5) smoothing technique for a project I'm working on and I have a question on its implementation. This is a smoothing algorithm used in NLP, Good-Turing is I guess the most well known similar algorithm.

A brief description of the algorithm is: When trying to find the probability of word appearing in a sentence it will first look for context for the word at the n-gram level and if there is no n-gram of that size it will recurse to the (n-1)-gram and multiply its score with 0.4. The recursion stops at unigrams.

So if I want to find the probability of "day" in the context of "a sunny day" it would first look to see if the tri-gram "a sunny day" exists in the corpus, if not it would try the same with the bigram "sunny day" and finally it would just get the frequency for "day" divided by the corpus size (total number of words in the training data).

My question is: Do I multiply the score with 0.4 every time I reduce the size of the n-gram?

So in the above example if we are not able to find a tri-gram or bi-gram the final score would be:

0.4 * 0.4 * frequency(day) / corpus_size?

or do I just multiply once at the final level so regardless of how many backoffs I have to make I just multiply the final score with 0.4?

like image 491
Bar Avatar asked May 05 '13 09:05

Bar


People also ask

What is backoff process?

Back Off Algorithm is an algorithm used for collision resolution. It works as, When this collision occurs, both the devices wait for a random amount of time before retransmitting the signal again, they keep on trying until the data is transferred successfully.

What is back off in NLP?

Katz back-off is a generative n-gram language model that estimates the conditional probability of a word given its history in the n-gram. It accomplishes this estimation by backing off through progressively shorter history models under certain conditions.


1 Answers

Basically I read equation 5 as you describe in your math above.

So for "a sunny day" where no instance was observed, you would calculate S("day" | "a sunny"). Not finding the trigram "a sunny day" you would take case two in equation 5, and estimate S("day" | "a sunny") as alpha * S("day" | "sunny").

If again, you recorded no observances of "sunny day" you would approximate S("day" | "sunny") as alpha * S("day"), which is the terminal case f("day") / N (the number of observed unigrams).

By setting alpha to 0.4 you get exactly what you wrote out above.

Hope this helps.

-bms20

like image 122
bms20 Avatar answered Jun 14 '23 23:06

bms20