Package to generate n-gram language models with smoothing? (Alternatives to NLTK)

Question

I'd like to find some type of package or module (preferably Python or Perl, but others would do) that automatically generate n-gram probabilities from an input text, and can automatically apply one or more smoothing algorithms as well.

That is, I am looking for something like the NLTK NgramModel class. I can't use this for my purposes because there are some bugs with the smoothing functions which make it choke when you ask for the probability of a word it hasn't seen before.

I've read through the dev forums for NLTK and as of now there seems to be no progress on this.

Any alternatives out there?

Alan H. · Accepted Answer

Looks like I answered my own question, so I'll mention what I've found here in case others are looking for it.

There are two toolkits that I've found:

SRILM
The CMU-Cambridge Statistical Language Modeling Toolkit

They appear to have very similar functionality. Both include a variety of smoothing functions.

Package to generate n-gram language models with smoothing? (Alternatives to NLTK)

Tags:

nlp

nltk

n-gram

Alan H.

1 Answers

Alan H.

Recent Activity

Donate For Us

Package to generate n-gram language models with smoothing? (Alternatives to NLTK)

Tags:

nlp

nltk

n-gram

Alan H.

1 Answers

Alan H.

Related questions

Recent Activity

Donate For Us