What is the Best way to define Adam Optimizer in PyTorch?

Question

For most PyTorch codes we use the following definition of Adam optimizer,

optim = torch.optim.Adam(model.parameters(), lr=cfg['lr'], weight_decay=cfg['weight_decay'])

However, after repeated trials, I found that the following definition of Adam gives 1.5 dB higher PSNR which is huge.

optim = torch.optim.Adam(
            [
                {'params': get_parameters(model, bias=False)},
                {'params': get_parameters(model, bias=True), 'lr': cfg['lr'] * 2, 'weight_decay': 0},
            ],
            lr=cfg['lr'],
            weight_decay=cfg['weight_decay'])

The Model is a usual U-net with parameters defined in init and forward action as in any other PyTorch model.

The get_parameters is defined as below.

def get_parameters(model, bias=False):
    for k, m in model._modules.items():
        print("get_parameters", k, type(m), type(m).__name__, bias)
        if bias:
            if isinstance(m, nn.Conv2d):
                yield m.bias
        else:
            if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
                yield m.weight

Could someone explain why the latter definition is better than the previous one?

kHarshit · Accepted Answer

In the second method, different configurations are being provided to update weights and biases. This is being done using per-parameter options for the optimizer.

optim = torch.optim.Adam(
            [
                {'params': get_parameters(model, bias=False)},
                {'params': get_parameters(model, bias=True), 'lr': cfg['lr'] * 2, 'weight_decay': 0},
            ],
            lr=cfg['lr'],
            weight_decay=cfg['weight_decay'])

As per this, the learning rate for biases is 2 times that of weights, and weight decay is 0.

Now, the reason why it's being done could be the network not learning properly. Read more Why is the learning rate for the bias usually twice as large as the the LR for the weights?

What is the Best way to define Adam Optimizer in PyTorch?

Tags:

python

optimization

pytorch

Mohit Lamba

1 Answers

kHarshit

Recent Activity

Donate For Us

What is the Best way to define Adam Optimizer in PyTorch?

Tags:

python

optimization

pytorch

Mohit Lamba

1 Answers

kHarshit

Related questions

Recent Activity

Donate For Us