I have implemented the value iteration algorithm for simple Markov decision process Wikipedia in Python. In order to keep the structure (states, actions, transitions, rewards) of the particular Markov process and iterate over it I have used the following data structures:
dictionary for states and actions that are available for those states:
SA = { 'state A': {' action 1', 'action 2', ..}, ...}
dictionary for transition probabilities:
T = {('state A', 'action 1'): {'state B': probability}, ...}
dictionary for rewards:
R = {('state A', 'action 1'): {'state B': reward}, ...}
.
My question is: is this the right approach? What are the most suitable data structures (in Python) for MDP?
An MDP is defined by (S, A, P, R, γ), where A is the set of actions. It is essentially MRP with actions.
In mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker.
The two classical algorithms used to determine the optimal policy for a MDP are the policy iteration algorithm and the value iteration algorithm, which is presented below [8].
I implemented Markov Decision Processes in Python before and found the following code useful.
http://aima.cs.berkeley.edu/python/mdp.html
This code is taken from Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig.
Whether a data structure is suitable or not mostly depends on what you do with the data. You mention that you want to iterate over the process, so optimize your data structure for this purpose.
Transitions in Markov processes are often modeled by matrix multiplications. The transition probabilities Pa(s1,s2)
and the rewards Ra(s1,s2)
could be described by (potentially sparse) matrices Pa
and Ra
indexed by the states. I think this would have a few advantages:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With