I'm doing fine-tuning with pytorch using resnet50 and want to set the learning rate of the last fully connected layer to 10^-3 while the learning rate of other layers be set to 10^-6. I know that I can just follow the method in its document:
optim.SGD([{'params': model.base.parameters()},
{'params': model.classifier.parameters(), 'lr': 1e-3}],
lr=1e-2, momentum=0.9)
But is there anyway that I do not need to set the parameters layer by layer
classifier 's parameters will use a learning rate of 1e-3 , and a momentum of 0.9 will be used for all parameters.
A discriminative learning rate is when you train a neural net with different learning rates for different layers.
You can group layers. If you want to group all linear layers, the best way to do it is use modules
:
param_grp = []
for idx, m in enumerate(model.modules()):
if isinstance(m, nn.Linear):
param_grp.append(m.weight)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With