I set learning rate decay in my optimizer Adam, such as <pre class="prettyprint lang-py prettyprint-override"><code>LR = 1e-3 LR_DECAY = 1e-2 OPTIMIZER = Adam(lr=LR, decay=LR_DECAY) </code></pre> As the keras document Adam states, after each epoch learning rate would be <pre class="prettyprint"><code>lr = lr * (1. / (1. + self.decay * K.cast(self.iterations, K.dtype(self.decay)))) </code></pre> If I understand correctly, learning rate be like this, <pre class="prettyprint"><code>lr = lr * 1 / ( 1 + num_epoch * decay) </code></pre> But I don't see the learning rate decay come into effect, after seeing that printed out. Is there any problem when I use this ? Edit I print out the learning by setting the verbose of the <code>ReduceLROnPlateau</code>, such as, <pre class="prettyprint"><code>ReduceLROnPlateau(monitor='val_loss', factor=0.75, patience=Config.REDUCE_LR_PATIENCE, verbose=1, mode='auto', epsilon=0.01, cooldown=0, min_lr=1e-6 </code></pre> And that would monitor the val-loss and reduce the learning rate by multiplying the <code>factor</code>. The printed learning rate is like this, <pre class="prettyprint"><code>Epoch 00003: ReduceLROnPlateau reducing learning rate to 0.0007500000356230885. </code></pre> And I set the initial learning rate to be 1e-3. Therefore, it appears that the learning rate change from 1e-3 to 1e-3 * 0.75, so I doubt that the <code>decay</code> I set in Adam isn't working.

The learning rate changes with every iteration, i.e., with every batch and not epoch. So, if you set the decay = 1e-2 and each epoch has 100 batches/iterations, then after 1 epoch your learning rate will be <pre class="prettyprint"><code>lr = init_lr * 1/(1 + 1e-2 * 100) </code></pre> So, if I want my learning rate to be 0.75 of the original learning rate at the end of each epoch, I would set the lr_decay to <pre class="prettyprint"><code>batches_per_epoch = dataset_size/batch_size lr_decay = (1./0.75 -1)/batches_per_epoch </code></pre> It seems to work for me. Also, since the new learning rate is calculated at every iteration, the optimizer doesn't change the value of the learning rate variable and always uses the initial learning rate to calculate the effective learning rate.

How is learning rate decay implemented by Adam in keras

Tags:

python

tensorflow

deep-learning

keras

I set learning rate decay in my optimizer Adam, such as

LR = 1e-3
LR_DECAY = 1e-2
OPTIMIZER = Adam(lr=LR, decay=LR_DECAY)

As the keras document Adam states, after each epoch learning rate would be

lr = lr * (1. / (1. + self.decay * K.cast(self.iterations, K.dtype(self.decay))))

If I understand correctly, learning rate be like this,

lr = lr * 1 / ( 1 + num_epoch * decay)

But I don't see the learning rate decay come into effect, after seeing that printed out. Is there any problem when I use this ?

Edit
I print out the learning by setting the verbose of the ReduceLROnPlateau, such as,

ReduceLROnPlateau(monitor='val_loss', factor=0.75, patience=Config.REDUCE_LR_PATIENCE, verbose=1, mode='auto', epsilon=0.01, cooldown=0, min_lr=1e-6

And that would monitor the val-loss and reduce the learning rate by multiplying the factor. The printed learning rate is like this,

Epoch 00003: ReduceLROnPlateau reducing learning rate to 0.0007500000356230885.

And I set the initial learning rate to be 1e-3. Therefore, it appears that the learning rate change from 1e-3 to 1e-3 * 0.75, so I doubt that the decay I set in Adam isn't working.

682

asked Aug 16 '19 21:08

yujuezhao

1 Answers

The learning rate changes with every iteration, i.e., with every batch and not epoch. So, if you set the decay = 1e-2 and each epoch has 100 batches/iterations, then after 1 epoch your learning rate will be

lr = init_lr * 1/(1 + 1e-2 * 100)

So, if I want my learning rate to be 0.75 of the original learning rate at the end of each epoch, I would set the lr_decay to

batches_per_epoch = dataset_size/batch_size
lr_decay = (1./0.75 -1)/batches_per_epoch

It seems to work for me. Also, since the new learning rate is calculated at every iteration, the optimizer doesn't change the value of the learning rate variable and always uses the initial learning rate to calculate the effective learning rate.

144

answered Oct 20 '22 02:10

ssaz_5

Related questions
                            
                                Why my one-filter convolutional neural network is unable to learn a simple gaussian kernel?
                            
                                Install from pipfile using pipenv install gives error
                            
                                How Batch learning in Pytorch is performed?
                            
                                How to enable logging of Flask app with `gevent.pywsgi.WSGIServer` and `WebSocketHandler`?
                            
                                Read YAML file as list
                            
                                How to vectorize a loop through a matrix numpy
                            
                                Edit existing PDF's pages in Python
                            
                                Setting the Python path for local project in VS Code without using the settings.json file
                            
                                Sentiment analysis Pipeline, problem getting the correct feature names when feature selection is used
                            
                                Using seaborn lineplot with grouping variable
                            
                                Multiprocessing code fails when run with pdb?
                            
                                Conda SafetyError: file has an incorrect size
                            
                                Loss is NaN on image classification task
                            
                                Fast way to find the closest polygon to a point
                            
                                Keras custom loss function (elastic net)
                            
                                How to reset locale back to original after changing it in Python?
                            
                                Pipenv: dependencies of platform specific packages are installed unconditionally?
                            
                                pyspark 'DataFrame' object has no attribute '_get_object_id'
                            
                                Is `pickle.dump(d, f)` equivalent to `f.write(pickle.dumps(d))`?
                            
                                Variable tf.Variable has 'None' for gradient in TensorFlow Probability

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With