Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to use a certain Reinforcement Learning algorithm?

I'm studying Reinforcement Learning and reading Sutton's book for a university course. Beside the classic PD, MC, TD and Q-Learning algorithms, I'm reading about policy gradient methods and genetic algorithms for the resolution of decision problems. I have never had experience before in this topic and I'm having problems understanding when a technique should be preferred over another. I have a few ideas, but I'm not sure about them. Can someone briefly explain or tell me a source where I can find something about typical situation where a certain methods should be used? As far as I understand:

  • Dynamic Programming and Linear Programming should be used only when the MDP has few actions and states and the model is known, since it's very expensive. But when DP is better than LP?
  • Monte Carlo methods are used when I don't have the model of the problem but I can generate samples. It does not have bias but has high variance.
  • Temporal Difference methods should be used when MC methods need too many samples to have low variance. But when should I use TD and when Q-Learning?
  • Policy Gradient and Genetic algorithms are good for continuous MDPs. But when one is better than the other?

More precisely, I think that to choose a learning methods a programmer should ask himlself the following questions:

  • does the agent learn online or offline?
  • can we separate exploring and exploiting phases?
  • can we perform enough exploration?
  • is the horizon of the MDP finite or infinite?
  • are states and actions continuous?

But I don't know how these details of the problem affect the choice of a learning method. I hope that some programmer has already had some experience about RL methods and can help me to better understand their applications.

like image 384
Simon Avatar asked Mar 28 '14 21:03

Simon


People also ask

When should reinforcement learning be used?

RL is a perfect fit for problems that require sequential decision-making – that is, a series of decisions that all affect one another. If you are developing an AI program to win at a game, it is not enough for the algorithm to make one good decision; it must make a whole sequence of good decisions.

What is the most effective use of reinforcement learning?

Reinforcement Learning approaches are used in the field of Game Optimization and simulating synthetic environments for game creation. Reinforcement Learning also finds application in self-driving cars to train an agent for optimizing trajectories and dynamically planning the most efficient path.

What are some of the most used reinforcement learning algorithms?

Three methods for reinforcement learning are 1) Value-based 2) Policy-based and Model based learning. Agent, State, Reward, Environment, Value function Model of the environment, Model based methods, are some important terms using in RL learning method.


1 Answers

Briefly:

does the agent learn online or offline? helps you to decide either using on-line or off-line algorithms. (e.g. on-line: SARSA, off-line: Q-learning). On-line methods have more limitations and need more attention to pay.

can we separate exploring and exploiting phases? These two phase are normally in a balance. For example in epsilon-greedy action selection, you use an (epsilon) probability for exploiting and (1-epsilon) probability for exploring. You can separate these two and ask the algorithm just explore first (e.g. choosing random actions) and then exploit. But this situation is possible when you are learning off-line and probably using a model for the dynamics of the system. And it normally means collecting a lot of sample data in advance.

can we perform enough exploration? The level of exploration can be decided depending on the definition of the problem. For example, if you have a simulation model of the problem in memory, then you can explore as you want. But real exploring is limited to amount of resources you have. (e.g. energy, time, ...)

are states and actions continuous? Considering this assumption helps to choose the right approach (algorithm). There are both discrete and continuous algorithms developed for RL. Some of "continuous" algorithms internally discretize the state or action spaces.

like image 180
NKN Avatar answered Sep 18 '22 00:09

NKN