Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it good learning rate for Adam method?

Tags:

I am training my method. I got the result as below. Is it a good learning rate? If not, is it high or low? This is my result

enter image description here

lr_policy: "step" gamma: 0.1 stepsize: 10000 power: 0.75 # lr for unnormalized softmax base_lr: 0.001 # high momentum momentum: 0.99 # no gradient accumulation iter_size: 1 max_iter: 100000 weight_decay: 0.0005 snapshot: 4000 snapshot_prefix: "snapshot/train" type:"Adam" 

This is reference

With low learning rates the improvements will be linear. With high learning rates they will start to look more exponential. Higher learning rates will decay the loss faster, but they get stuck at worse values of loss enter image description here

like image 966
John Avatar asked Mar 23 '17 02:03

John


People also ask

Is learning rate important for Adam?

Adam's learning rate may need tuning and is not necessarily the best algorithm. But there is also research showing that it may be beneficial to use other (that Adam's) learning rate schedules. So it is not that easy, Adam isn't necessarily enough.

What is a good learning rate value?

A traditional default value for the learning rate is 0.1 or 0.01, and this may represent a good starting point on your problem.

What is the default learning rate for Adam?

LearningRateSchedule , or a callable that takes no arguments and returns the actual value to use, The learning rate. Defaults to 0.001.

Does Adam have learning rate decay?

Further, learning rate decay can also be used with Adam. The paper uses a decay rate alpha = alpha/sqrt(t) updted each epoch (t) for the logistic regression demonstration. The TensorFlow documentation suggests some tuning of epsilon: The default value of 1e-8 for epsilon might not be a good default in general.


1 Answers

The learning rate looks a bit high. The curve decreases too fast for my taste and flattens out very soon. I would try 0.0005 or 0.0001 as a base learning rate if I wanted to get additional performance. You can quit after several epochs anyways if you see that this does not work.

The question you have to ask yourself though is how much performance do you need and how close you are to accomplishing the performance required. I mean that you are probably training a neural network for a specific purpose. Often times you can get more performance out of the network by increasing its capacity, instead of fine tuning the learning rate which is pretty good if not perfect anyways.

like image 75
Thomas Pinetz Avatar answered Oct 17 '22 00:10

Thomas Pinetz