Is it worth to change learning rate after certain conditions are met? And how and why to do it? For example net will start with high learning rate and after squared error is low enough learning rate will drop for better precision or learning rate should increase to jump-out of local minima?. Wouldn't it cause over-fitting? And what about momentum?
Generally, a large learning rate allows the model to learn faster, at the cost of arriving on a sub-optimal final set of weights. A smaller learning rate may allow the model to learn a more optimal or even globally optimal set of weights but may take significantly longer to train.
If your learning rate is set too low, training will progress very slowly as you are making very tiny updates to the weights in your network. However, if your learning rate is set too high, it can cause undesirable divergent behavior in your loss function.
Furthermore, the learning rate affects how quickly our model can converge to a local minima (aka arrive at the best accuracy). Thus getting it right from the get go would mean lesser time for us to train the model.
The learning rate controls how quickly the model is adapted to the problem. Smaller learning rates require more training epochs given the smaller changes made to the weights each update, whereas larger learning rates result in rapid changes and require fewer training epochs.
Usually you should start with a high learning rate and a low momentum. Then you decrease the learning rate over time and increase the momentum. The idea is to allow more exploration at the beginning of the learning and force convergence at the end of the learning. Usually you should look at the training error to set up your learning schedule: if it got stuck, i.e. the error does not change, it is time to decrease your learning rate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With