Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adam optimizer with warmup on PyTorch

In the paper Attention is all you need, under section 5.3, the authors suggested to increase the learning rate linearly and then decrease proportionally to the inverse square root of steps.

Paper SS

How do we implement this in PyTorch with Adam optimizer? Preferably without additional packages.

like image 683
Jingles Avatar asked Dec 30 '22 18:12

Jingles


1 Answers

PyTorch provides learning-rate-schedulers for implementing various methods of adjusting the learning rate during the training process. Some simple LR-schedulers are are already implemented and can be found here: https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

In your special case you can - just like the other LR-schedulers do - subclass _LRScheduler for implementing a variable schedule based on the number of epochs. For a bare-bones method you only need to implement __init__() and get_lr() methods.

Just note that many of these schedulers expect you to call .step() once per epoch. But you can also update it more frequently or even pass a custom argument just like in the cosine-annealing LR-scheduler: https://pytorch.org/docs/stable/_modules/torch/optim/lr_scheduler.html#CosineAnnealingLR

like image 113
flawr Avatar answered Jan 02 '23 08:01

flawr