Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need good way to choose and adjust a "learning rate"

In the picture below you can see a learning algorithm trying to learn to produce a desired output (the red line). The learning algorithm is similar to a backward error propagation neural network.

The "learning rate" is a value that controls the size of the adjustments made during the training process. If the learning rate is too high, then the algorithm learns quickly but its predictions jump around a lot during the training process (green line - learning rate of 0.001), if it is lower then the predictions jump around less, but the algorithm takes a lot longer to learn (blue line - learning rate of 0.0001).

The black lines are moving averages.

How can I adapt the learning rate so that it converges to close to the desired output initially, but then slows down so that it can hone in on the correct value?

learning rate graph http://img.skitch.com/20090605-pqpkse1yr1e5r869y6eehmpsym.png

like image 939
sanity Avatar asked Jun 05 '09 18:06

sanity


1 Answers

Sometimes the process of decreasing the learning rate over time is called "annealing" the learning rate.

There are many possible "annealing schedules", like having the learning rate be a linear function of time:

u(t) = c / t

...where c is some constant. Or there is the "search-then-converge" schedule:

u(t) = A * (1 + (c/A)*(t/T)) / 
           (1 + (c/A)*(t/T) + T*(t^2)/(T^2))

...which keeps the learning rate around A when t is small compared to T (the "search" phase) and then decreases the learning rate when t is large compared to T (the "converge" phase). Of course, for both of these approaches you have to tune parameters (e.g. c, A, or T) but hopefully introducing them will help more than it will hurt. :)

Some references:

  • Learning Rate Schedules for Faster Stochastic Gradient Search, Christian Darken, Joseph Chang and John Moody, Neural Networks for Signal Processing 2 --- Proceedings of the 1992 IEEE Workshop, IEEE Press, Piscataway, NJ, 1992.
  • A Stochastic Approximation Method, Herbert Robbins and Sutton Monro, Annals of Mathematical Statistics 22, #3 (September 1951), pp. 400–407.
  • Neural Networks and Learning Machines (section 3.13 in particular), Simon S. Haykin, 3rd edition (2008), ISBN 0131471392, 9780131471399
  • Here is a page that briefly discusses learning rate adaptation.
like image 91
Nate Kohl Avatar answered Oct 06 '22 20:10

Nate Kohl