I'd like to find some type of package or module (preferably Python or Perl, but others would do) that automatically generate n-gram probabilities from an input text, and can automatically apply one or more smoothing algorithms as well.
That is, I am looking for something like the NLTK NgramModel
class. I can't use this for my purposes because there are some bugs with the smoothing functions which make it choke when you ask for the probability of a word it hasn't seen before.
I've read through the dev forums for NLTK and as of now there seems to be no progress on this.
Any alternatives out there?
Looks like I answered my own question, so I'll mention what I've found here in case others are looking for it.
There are two toolkits that I've found:
SRILM
The CMU-Cambridge Statistical Language Modeling Toolkit
They appear to have very similar functionality. Both include a variety of smoothing functions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With