Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Package to generate n-gram language models with smoothing? (Alternatives to NLTK)

Tags:

nlp

nltk

n-gram

I'd like to find some type of package or module (preferably Python or Perl, but others would do) that automatically generate n-gram probabilities from an input text, and can automatically apply one or more smoothing algorithms as well.

That is, I am looking for something like the NLTK NgramModel class. I can't use this for my purposes because there are some bugs with the smoothing functions which make it choke when you ask for the probability of a word it hasn't seen before.

I've read through the dev forums for NLTK and as of now there seems to be no progress on this.

Any alternatives out there?

like image 979
Alan H. Avatar asked Oct 11 '22 12:10

Alan H.


1 Answers

Looks like I answered my own question, so I'll mention what I've found here in case others are looking for it.

There are two toolkits that I've found:

  • SRILM

  • The CMU-Cambridge Statistical Language Modeling Toolkit

They appear to have very similar functionality. Both include a variety of smoothing functions.

like image 176
Alan H. Avatar answered Dec 20 '22 15:12

Alan H.