In the picture below you can see a learning algorithm trying to learn to produce a desired output (the red line). The learning algorithm is similar to a backward error propagation neural network.
The "learning rate" is a value that controls the size of the adjustments made during the training process. If the learning rate is too high, then the algorithm learns quickly but its predictions jump around a lot during the training process (green line - learning rate of 0.001), if it is lower then the predictions jump around less, but the algorithm takes a lot longer to learn (blue line - learning rate of 0.0001).
The black lines are moving averages.
How can I adapt the learning rate so that it converges to close to the desired output initially, but then slows down so that it can hone in on the correct value?
learning rate graph http://img.skitch.com/20090605-pqpkse1yr1e5r869y6eehmpsym.png
Sometimes the process of decreasing the learning rate over time is called "annealing" the learning rate.
There are many possible "annealing schedules", like having the learning rate be a linear function of time:
u(t) = c / t
...where c is some constant.  Or there is the "search-then-converge" schedule:
u(t) = A * (1 + (c/A)*(t/T)) / 
           (1 + (c/A)*(t/T) + T*(t^2)/(T^2))
...which keeps the learning rate around A when t is small compared to T (the "search" phase) and then decreases the learning rate when t is large compared to T (the "converge" phase).  Of course, for both of these approaches you have to tune parameters (e.g. c, A, or T) but hopefully introducing them will help more than it will hurt.  :)
Some references:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With