Given bigram probabilities for words in a text, how would one compute trigram probabilities?
For example, if we know that P(dog cat) = 0.3 and P(cat mouse) = 0.2
how do we find the probability of P(dog cat mouse)?
Thank you!
In the following I consider a trigram as three random variables A,B,C. So dog cat horse would be A=dog, B=cat, C=horse.
Using the chain rule: P(A,B,C) = P(A,B) * P(C|A,B). Now your stuck if you want to stay exact.
What you can do is assuming C is independent of A given B. Then it holds that P(C|A,B) = P(C|B). And P(C|B) = P(C,B) / P(B), which you should be able to compute from your trigram frequencies. Note that in your case P(C|B) should really be the probability of C following a B, so it's the probability of a BC divided by the probability of a B*.
So to sum it up, when using the conditional independence assumption:
P(ABC) = P(AB) * P(BC) / P(B*)
And to compute P(B*) you have to sum up the probabilities for all trigrams beginning with B.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With