I'm trying to build an n-gram markov model from a given piece of text, and then access the transition table for it so I can calculate the conditional entropy for each sequence of words of length n (the grams). For example, in a 2-gram model, after reading in a corpus of text
"dogs chase cats dogs chase cats dogs chase cats dogs chase cats dogs chase cats dogs chase cats dogs chase cats dogs chase cats dogs chase cats dogs chase people"
and building an internal transition table, the state "dogs chase" may transition to the state "chase cats" with probability 0.9, and to state "chase people" with probability 0.1. If I know of the possible transitions, I can calculate the conditional entropy.
Are there any good python libraries for doing this? I've checked NLTK, SRILM, and others but haven't found much.
It's only been a little just over 4 years since this post was first created, and I found myself having the same issue. While it is possible to do this manually, I've gone ahead and created the adaptationism package which provides a bit more functionality!
Not only are you able to access transition tables, but you can also do this for any N-gram combinations.
I will continue to build out this toolkit as time goes on, and please feel free to ping me with suggestions for future functionality!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With