Finding conditional probability of trigram in python nltk

Question

I have started learning NLTK and I am following a tutorial from here, where they find conditional probability using bigrams like this.

import nltk
from nltk.corpus import brown
cfreq_brown_2gram = nltk.ConditionalFreqDist(nltk.bigrams(brown.words()))

However I want to find conditional probability using trigrams. When I try to change nltk.bigrams to nltk.trigrams I get the following error.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "home/env/local/lib/python2.7/site-packages/nltk/probability.py", line 1705, in __init__
    for (cond, sample) in cond_samples:
ValueError: too many values to unpack (expected 2)

How can I calculate the conditional probability using trigrams?

Ilia Kurenkov · Accepted Answer

nltk.ConditionalFreqDist expects its data as a sequence of (condition, item) tuples. nltk.trigrams returns tuples of length 3, which causes the exact error you posted.

From your post it's not exactly clear what you want to use as conditions, but the convention when doing language modeling is to condition the last word on its predecessors. The following code demonstrates how you'd implement that.

brown_trigrams = nltk.trigrams(brown.words())
condition_pairs = (((w0, w1), w2) for w0, w1, w2 in brown_trigrams)
cfd_brown = nltk.ConditionalFreqDist(condition_pairs)

Finding conditional probability of trigram in python nltk

Tags:

python

nlp

nltk

n-gram

Riken Shah

1 Answers

Ilia Kurenkov

Recent Activity

Donate For Us

Finding conditional probability of trigram in python nltk

Tags:

python

nlp

nltk

n-gram

Riken Shah

1 Answers

Ilia Kurenkov

Related questions

Recent Activity

Donate For Us