I am new to neural networks and, to get grip on the matter, I have implemented a basic feed-forward MLP which I currently train through back-propagation. I am aware that there are more sophisticated and better ways to do that, but in Introduction to Machine Learning they suggest that with one or two tricks, basic gradient descent can be effective for learning from real world data. One of the tricks is adaptive learning rate. The idea is to increase the learning rate by a constant value a when the error gets smaller, and decrease it by a fraction b of the learning rate when the error gets larger. So basically the learning rate change is determined by: <pre class="prettyprint"><code>+(a) </code></pre> if we're learning in the right direction, and <pre class="prettyprint"><code>-(b * <learning rate>) </code></pre> if we're ruining our learning. However, on the above book there's no advice on how to set these parameters. I wouldn't expect a precise suggestion since parameter tuning is a whole topic on its own, but just a hint at least on their order of magnitude. Any ideas? Thank you, Tunnuz

I haven't looked at neural networks for the longest time (10 years+) but after I saw your question I thought I would have a quick scout about. I kept seeing the same figures all over the internet in relation to increase(a) and decrease(b) factor (1.2 & 0.5 respectively). I have managed to track these values down to Martin Riedmiller and Heinrich Braun's RPROP algorithm (1992). Riedmiller and Braun are quite specific about sensible parameters to choose. See: RPROP: A Fast Adaptive Learning Algorithm I hope this helps.

Which multiplication and addition factor to use when doing adaptive learning rate in neural networks?

Tags:

neural-network

backpropagation

I am new to neural networks and, to get grip on the matter, I have implemented a basic feed-forward MLP which I currently train through back-propagation. I am aware that there are more sophisticated and better ways to do that, but in Introduction to Machine Learning they suggest that with one or two tricks, basic gradient descent can be effective for learning from real world data. One of the tricks is adaptive learning rate.

The idea is to increase the learning rate by a constant value a when the error gets smaller, and decrease it by a fraction b of the learning rate when the error gets larger. So basically the learning rate change is determined by:

+(a)

if we're learning in the right direction, and

-(b * <learning rate>)

if we're ruining our learning. However, on the above book there's no advice on how to set these parameters. I wouldn't expect a precise suggestion since parameter tuning is a whole topic on its own, but just a hint at least on their order of magnitude. Any ideas?

Thank you,
Tunnuz

996

asked Sep 08 '11 08:09

tunnuz

1 Answers

I haven't looked at neural networks for the longest time (10 years+) but after I saw your question I thought I would have a quick scout about. I kept seeing the same figures all over the internet in relation to increase(a) and decrease(b) factor (1.2 & 0.5 respectively).

I have managed to track these values down to Martin Riedmiller and Heinrich Braun's RPROP algorithm (1992). Riedmiller and Braun are quite specific about sensible parameters to choose.

See: RPROP: A Fast Adaptive Learning Algorithm

I hope this helps.

answered Oct 23 '22 03:10

Mark McLaren

Related questions
                            
                                How to use max pooling to gather information from LSTM nodes
                            
                                Threading in tensorflow's input pipeline
                            
                                Why is a CNN slower to train than a fully connected MLP in Keras?
                            
                                Tensorflow Autoencoder - How To Calculate Reconstruction Error?
                            
                                Keras simple RNN implementation
                            
                                keras combine pretrained model
                            
                                pytorch variable index lost one dimension
                            
                                Advanced Activation layers in Keras Functional API
                            
                                questions on clustering methods
                            
                                Are neural networks really abandonware?
                            
                                how to write a matlab code for a pattern recognition in neural network
                            
                                Cannot train a neural network solving XOR mapping
                            
                                LSTM implementation with peephole
                            
                                What layers should experience "dropout" when training a Neural Network?
                            
                                Save or export weights and biases in TensorFlow for non-Python replication
                            
                                Dimensions in convolutional neural network
                            
                                How much data is actually required to train a doc2Vec model?
                            
                                batch normalization, yes or no?
                            
                                What does it mean to "break symmetry"? in the context of neural network programming? [duplicate]
                            
                                Does it make sense to build a residual network with only fully connected layers (instedad of convolutional layers)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With