Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data structure for Markov Decision Process [closed]

I have implemented the value iteration algorithm for simple Markov decision process Wikipedia in Python. In order to keep the structure (states, actions, transitions, rewards) of the particular Markov process and iterate over it I have used the following data structures:

  1. dictionary for states and actions that are available for those states:

    SA = { 'state A': {' action 1', 'action 2', ..}, ...}

  2. dictionary for transition probabilities:

    T = {('state A', 'action 1'): {'state B': probability}, ...}

  3. dictionary for rewards:

    R = {('state A', 'action 1'): {'state B': reward}, ...}.

My question is: is this the right approach? What are the most suitable data structures (in Python) for MDP?

like image 849
JackAW Avatar asked Dec 20 '12 20:12

JackAW


People also ask

What are the five essential parameters that define Markov decision process?

An MDP is defined by (S, A, P, R, γ), where A is the set of actions. It is essentially MRP with actions.

How does the Markov decision process work?

In mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker.

What are the various techniques used by an agent to choose the best policy in MDP model?

The two classical algorithms used to determine the optimal policy for a MDP are the policy iteration algorithm and the value iteration algorithm, which is presented below [8].


2 Answers

I implemented Markov Decision Processes in Python before and found the following code useful.

http://aima.cs.berkeley.edu/python/mdp.html

This code is taken from Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig.

like image 196
Xiong Yiliang Avatar answered Sep 26 '22 07:09

Xiong Yiliang


Whether a data structure is suitable or not mostly depends on what you do with the data. You mention that you want to iterate over the process, so optimize your data structure for this purpose.

Transitions in Markov processes are often modeled by matrix multiplications. The transition probabilities Pa(s1,s2) and the rewards Ra(s1,s2) could be described by (potentially sparse) matrices Pa and Ra indexed by the states. I think this would have a few advantages:

  • If you use numpy arrays for this, indexing will probably be faster than with the dictionaries.
  • Also state transitions could then be simply described by matrix multiplication.
  • Process simulation with for example roulette wheel selection will be faster and more clearly implemented, since you simply need to pick the corresponding column of the transition matrix.
like image 27
silvado Avatar answered Sep 25 '22 07:09

silvado