I am trying to devise an iterative markov decision process (MDP) agent in Python with the following characteristics:
So the basic idea is the MDP should make its best guess optimized move at T using its current probability model (and since its probabilistic the move it makes is expectedly stochastic implying possible randomness), couple the new input state at T+1 with the reward from previous move at T and reevaluate the model. The convergence must not be permanent since the reward may modulate or the available actions could change.
What I'd like to know is if there are any current python libraries (preferably cross-platform as I necessarily change environments between Windoze and Linux) that can do this sort of thing already (or may support it with suitable customization eg: derived class support that allows redefining say reward method with one's own).
I'm finding information about on-line per-move MDP learning is rather scarce. Most use of MDP that I can find seems to focus on solving the entire policy as a preprocessing step.
Keras is a Python library that is designed specifically for developing the neural networks for ML models. It can run on top of Theano and TensorFlow to train neural networks.
I am a grad student doing lots of MCMC stuff in Python and to my knowledge nothing implements MDPs directly. The closest thing I am aware of is PyMC. Digging around the documentation provided this, which gives some advice on extending their classes. They definitely don't have rewards, etc., available out of the box.
If you're serious about developing something good, you might consider extending and subclassing the PyMC stuff to create your decision processes, as then you can get it included in the next update of PyMC and help out lots of future folks.
Here is a python toolbox for MDPs.
Caveat: It's for vanilla textbook MDPs and not for partially observable MDPs (POMDPs), or any kind of non-stationarity in rewards.
Second Caveat: I found the documentation to be really lacking. You have to look in the python code if you want to know what it implements or you can quickly look at their documentation for a similar toolbox they have for MATLAB.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With