Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I apply reinforcement learning to continuous action spaces?

I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning).

I'm hoping to use the Q-learning technique, but while I've found a way to extend this method to continuous state spaces, I can't seem to figure out how to accommodate a problem with a continuous action space.

I could just force all mouse movement to be of a certain magnitude and in only a certain number of different directions, but any reasonable way of making the actions discrete would yield a huge action space. Since standard Q-learning requires the agent to evaluate all possible actions, such an approximation doesn't solve the problem in any practical sense.

like image 711
zergylord Avatar asked Aug 17 '11 19:08

zergylord


People also ask

What is continuous action space?

In a continuous action space, your agent must output some real-valued number, possibly in multiple dimensions. A good example is the MountianCar problem where you must output the force to apply as a real number.

Can DQN be used in continuous action space?

Adding continuous action space ends up with something like the Pendulum-v0 environment. This can be solved to some degree using DQN and discretising the action space (to e.g. 9 different actions). However, it is possible to make more optimal solutions using an Actor-Critic algorithm like A3C.


1 Answers

The common way of dealing with this problem is with actor-critic methods. These naturally extend to continuous action spaces. Basic Q-learning could diverge when working with approximations, however, if you still want to use it, you can try combining it with a self-organizing map, as done in "Applications of the self-organising map to reinforcement learning". The paper also contains some further references you might find useful.

like image 121
Don Reba Avatar answered Sep 23 '22 11:09

Don Reba