Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the Best way to define Adam Optimizer in PyTorch?

For most PyTorch codes we use the following definition of Adam optimizer,

optim = torch.optim.Adam(model.parameters(), lr=cfg['lr'], weight_decay=cfg['weight_decay'])

However, after repeated trials, I found that the following definition of Adam gives 1.5 dB higher PSNR which is huge.

optim = torch.optim.Adam(
            [
                {'params': get_parameters(model, bias=False)},
                {'params': get_parameters(model, bias=True), 'lr': cfg['lr'] * 2, 'weight_decay': 0},
            ],
            lr=cfg['lr'],
            weight_decay=cfg['weight_decay'])

The Model is a usual U-net with parameters defined in init and forward action as in any other PyTorch model.

The get_parameters is defined as below.

def get_parameters(model, bias=False):
    for k, m in model._modules.items():
        print("get_parameters", k, type(m), type(m).__name__, bias)
        if bias:
            if isinstance(m, nn.Conv2d):
                yield m.bias
        else:
            if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
                yield m.weight

Could someone explain why the latter definition is better than the previous one?

like image 837
Mohit Lamba Avatar asked Oct 15 '25 04:10

Mohit Lamba


1 Answers

In the second method, different configurations are being provided to update weights and biases. This is being done using per-parameter options for the optimizer.

optim = torch.optim.Adam(
            [
                {'params': get_parameters(model, bias=False)},
                {'params': get_parameters(model, bias=True), 'lr': cfg['lr'] * 2, 'weight_decay': 0},
            ],
            lr=cfg['lr'],
            weight_decay=cfg['weight_decay'])

As per this, the learning rate for biases is 2 times that of weights, and weight decay is 0.

Now, the reason why it's being done could be the network not learning properly. Read more Why is the learning rate for the bias usually twice as large as the the LR for the weights?

like image 86
kHarshit Avatar answered Oct 16 '25 16:10

kHarshit