Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which multiplication and addition factor to use when doing adaptive learning rate in neural networks?

I am new to neural networks and, to get grip on the matter, I have implemented a basic feed-forward MLP which I currently train through back-propagation. I am aware that there are more sophisticated and better ways to do that, but in Introduction to Machine Learning they suggest that with one or two tricks, basic gradient descent can be effective for learning from real world data. One of the tricks is adaptive learning rate.

The idea is to increase the learning rate by a constant value a when the error gets smaller, and decrease it by a fraction b of the learning rate when the error gets larger. So basically the learning rate change is determined by:

+(a)

if we're learning in the right direction, and

-(b * <learning rate>)

if we're ruining our learning. However, on the above book there's no advice on how to set these parameters. I wouldn't expect a precise suggestion since parameter tuning is a whole topic on its own, but just a hint at least on their order of magnitude. Any ideas?

Thank you,
Tunnuz

like image 996
tunnuz Avatar asked Sep 08 '11 08:09

tunnuz


People also ask

How do you use adaptive learning rate?

Adaptive learning rate methods are an optimization of gradient descent methods with the goal of minimizing the objective function of a network by using the gradient of the function and the parameters of the network.

Which learning algorithm calculates adaptive learning rates for each parameter?

RMSprop: In this algorithm, we keep different learning rates for different parameters, but the computation process is different. We calculate gradients of weights first and then consider the average of recent magnitudes of them. From this, we calculate a learning rate for that parameter.

How do you choose the best learning rate?

There are multiple ways to select a good starting point for the learning rate. A naive approach is to try a few different values and see which one gives you the best loss without sacrificing speed of training. We might start with a large value like 0.1, then try exponentially lower values: 0.01, 0.001, etc.

What is the idea of an adaptive learning rate describe any adaptive learning methods?

The challenge of using learning rate schedules is that their hyperparameters have to be defined in advance and they depend heavily on the type of model and problem. Another problem is that the same learning rate is applied to all parameter updates.


1 Answers

I haven't looked at neural networks for the longest time (10 years+) but after I saw your question I thought I would have a quick scout about. I kept seeing the same figures all over the internet in relation to increase(a) and decrease(b) factor (1.2 & 0.5 respectively).

I have managed to track these values down to Martin Riedmiller and Heinrich Braun's RPROP algorithm (1992). Riedmiller and Braun are quite specific about sensible parameters to choose.

See: RPROP: A Fast Adaptive Learning Algorithm

I hope this helps.

like image 69
Mark McLaren Avatar answered Oct 23 '22 03:10

Mark McLaren