Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Learning rate of a Q learning agent

The question how the learning rate influences the convergence rate and convergence itself. If the learning rate is constant, will Q function converge to the optimal on or learning rate should necessarily decay to guarantee convergence?

like image 840
uduck Avatar asked Oct 08 '15 09:10

uduck


People also ask

What is the learning rate in Q-learning?

- the learning rate, set between 0 and 1. Setting it to 0 means that the Q-values are never updated, hence nothing is learned. Setting a high value such as 0.9 means that learning can occur quickly.

What is the agent in Q-learning?

A Q-learning agent is a value-based reinforcement learning agent that trains a critic to estimate the return or future rewards. For a given observation, the agent selects and outputs the action for which the estimated return is greatest.

Why is Q-learning slow?

The main reason for the slow convergence of Q-learning is the combination of the sample-based stochastic approximation (that makes use of a decaying learning rate) and the fact that the Bellman operator propagates information throughout the whole space (specially when γ is close to 1).

How is learning rate determined?

The learning rate is related to the step length determined by inexact line search in quasi-Newton methods and related optimization algorithms. When conducting line searches, mini-batch sub-sampling (MBSS) affect the characteristics of the loss function along which the learning rate needs to be resolved.


1 Answers

Learning rate tells the magnitude of step that is taken towards the solution.

It should not be too big a number as it may continuously oscillate around the minima and it should not be too small of a number else it will take a lot of time and iterations to reach the minima.

The reason why decay is advised in learning rate is because initially when we are at a totally random point in solution space we need to take big leaps towards the solution and later when we come close to it, we make small jumps and hence small improvements to finally reach the minima.

Analogy can be made as: in the game of golf when the ball is far away from the hole, the player hits it very hard to get as close as possible to the hole. Later when he reaches the flagged area, he choses a different stick to get accurate short shot.

So its not that he won't be able to put the ball in the hole without choosing the short shot stick, he may send the ball ahead of the target two or three times. But it would be best if he plays optimally and uses the right amount of power to reach the hole. Same is for decayed learning rate.

like image 60
VishalTheBeast Avatar answered Sep 28 '22 08:09

VishalTheBeast