Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python libraries for on-line machine learning MDP

I am trying to devise an iterative markov decision process (MDP) agent in Python with the following characteristics:

  • observable state
    • I handle potential 'unknown' state by reserving some state space for answering query-type moves made by the DP (the state at t+1 will identify the previous query [or zero if previous move was not a query] as well as the embedded result vector) this space is padded with 0s to a fixed length to keep the state frame aligned regardless of query answered (whose data lengths may vary)
  • actions that may not always be available at all states
  • reward function may change over time
  • policy convergence should incremental and only computed per move

So the basic idea is the MDP should make its best guess optimized move at T using its current probability model (and since its probabilistic the move it makes is expectedly stochastic implying possible randomness), couple the new input state at T+1 with the reward from previous move at T and reevaluate the model. The convergence must not be permanent since the reward may modulate or the available actions could change.

What I'd like to know is if there are any current python libraries (preferably cross-platform as I necessarily change environments between Windoze and Linux) that can do this sort of thing already (or may support it with suitable customization eg: derived class support that allows redefining say reward method with one's own).

I'm finding information about on-line per-move MDP learning is rather scarce. Most use of MDP that I can find seems to focus on solving the entire policy as a preprocessing step.

like image 617
Brian Jack Avatar asked Feb 05 '12 02:02

Brian Jack


People also ask

Which Python library is used for neural network?

Keras is a Python library that is designed specifically for developing the neural networks for ML models. It can run on top of Theano and TensorFlow to train neural networks.


2 Answers

I am a grad student doing lots of MCMC stuff in Python and to my knowledge nothing implements MDPs directly. The closest thing I am aware of is PyMC. Digging around the documentation provided this, which gives some advice on extending their classes. They definitely don't have rewards, etc., available out of the box.

If you're serious about developing something good, you might consider extending and subclassing the PyMC stuff to create your decision processes, as then you can get it included in the next update of PyMC and help out lots of future folks.

like image 78
ely Avatar answered Sep 22 '22 05:09

ely


Here is a python toolbox for MDPs.

Caveat: It's for vanilla textbook MDPs and not for partially observable MDPs (POMDPs), or any kind of non-stationarity in rewards.

Second Caveat: I found the documentation to be really lacking. You have to look in the python code if you want to know what it implements or you can quickly look at their documentation for a similar toolbox they have for MATLAB.

like image 35
kitchenette Avatar answered Sep 23 '22 05:09

kitchenette