Adam optimizer with warmup on PyTorch

Question

In the paper Attention is all you need, under section 5.3, the authors suggested to increase the learning rate linearly and then decrease proportionally to the inverse square root of steps.

Paper SS

How do we implement this in PyTorch with Adam optimizer? Preferably without additional packages.

flawr · Accepted Answer

PyTorch provides learning-rate-schedulers for implementing various methods of adjusting the learning rate during the training process. Some simple LR-schedulers are are already implemented and can be found here: https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

In your special case you can - just like the other LR-schedulers do - subclass _LRScheduler for implementing a variable schedule based on the number of epochs. For a bare-bones method you only need to implement __init__() and get_lr() methods.

Just note that many of these schedulers expect you to call .step() once per epoch. But you can also update it more frequently or even pass a custom argument just like in the cosine-annealing LR-scheduler: https://pytorch.org/docs/stable/_modules/torch/optim/lr_scheduler.html#CosineAnnealingLR

Adam optimizer with warmup on PyTorch

Tags:

python

machine-learning

pytorch

Jingles

1 Answers

flawr

Recent Activity

Donate For Us

Adam optimizer with warmup on PyTorch

Tags:

python

machine-learning

pytorch

Jingles

1 Answers

flawr

Related questions

Recent Activity

Donate For Us