I'm looking for a pythonic interface to load ARPA files (back-off language models) and use them to evaluate some text, e.g. get its log-probability, perplexity etc.
I don't need to generate the ARPA file in Python, only to use it for querying.
Does anybody have a recommended package? I already saw kenlm and swig-srilm, but the first is very hard to set up in Windows and the second seems un-maintained anymore.
I found a nice under-development package called pynlpl which does exactly what i need, with very few dependencies (libxml2 is about enough), and it gives a pure pythonic implementation to ARPA files
What about the ARPA package?
It's rather lightweight. Its APIs are also quite intuitive and easy to learn. Although it's not as fast as kenlm, you may still wanna give it a try.
https://pypi.org/project/arpa/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With