I just try to find out how I can use Caffe. To do so, I just took a look at the different <code>.prototxt</code> files in the examples folder. There is one option I don't understand: <pre class="prettyprint"><code># The learning rate policy lr_policy: "inv" </code></pre> Possible values seem to be: <ul> <li><code>"fixed"</code></li> <li><code>"inv"</code></li> <li><code>"step"</code></li> <li><code>"multistep"</code></li> <li><code>"stepearly"</code></li> <li> <code>"poly"</code> </li> </ul> Could somebody please explain those options?

It is a common practice to decrease the learning rate (lr) as the optimization/learning process progresses. However, it is not clear how exactly the learning rate should be decreased as a function of the iteration number. If you use DIGITS as an interface to Caffe, you will be able to visually see how the different choices affect the learning rate. fixed: the learning rate is kept fixed throughout the learning process. <hr> inv: the learning rate is decaying as ~<code>1/T</code> <img src="https://i.stack.imgur.com/LScLY.png" alt="enter image description here"> <hr> step: the learning rate is piecewise constant, dropping every X iterations <img src="https://i.stack.imgur.com/W5h6j.png" alt="enter image description here"> <hr> multistep: piecewise constant at arbitrary intervals <img src="https://i.stack.imgur.com/DW0qa.png" alt="enter image description here"> <hr> You can see exactly how the learning rate is computed in the function <code>SGDSolver<Dtype>::GetLearningRate</code> (solvers/sgd_solver.cpp line ~30). <hr> Recently, I came across an interesting and unconventional approach to learning-rate tuning: Leslie N. Smith's work "No More Pesky Learning Rate Guessing Games". In his report, Leslie suggests to use <code>lr_policy</code> that alternates between decreasing and increasing the learning rate. His work also suggests how to implement this policy in Caffe.

What is `lr_policy` in Caffe?

Tags:

machine-learning

neural-network

deep-learning

gradient-descent

caffe

I just try to find out how I can use Caffe. To do so, I just took a look at the different .prototxt files in the examples folder. There is one option I don't understand:

# The learning rate policy lr_policy: "inv"

Possible values seem to be:

"fixed"
"inv"
"step"
"multistep"
"stepearly"
"poly"

Could somebody please explain those options?

836

asked May 04 '15 14:05

Martin Thoma

1 Answers

It is a common practice to decrease the learning rate (lr) as the optimization/learning process progresses. However, it is not clear how exactly the learning rate should be decreased as a function of the iteration number.

If you use DIGITS as an interface to Caffe, you will be able to visually see how the different choices affect the learning rate.

fixed: the learning rate is kept fixed throughout the learning process.

inv: the learning rate is decaying as ~1/T
enter image description here

step: the learning rate is piecewise constant, dropping every X iterations
enter image description here

multistep: piecewise constant at arbitrary intervals
enter image description here

You can see exactly how the learning rate is computed in the function SGDSolver<Dtype>::GetLearningRate (solvers/sgd_solver.cpp line ~30).

Recently, I came across an interesting and unconventional approach to learning-rate tuning: Leslie N. Smith's work "No More Pesky Learning Rate Guessing Games". In his report, Leslie suggests to use lr_policy that alternates between decreasing and increasing the learning rate. His work also suggests how to implement this policy in Caffe.

answered Nov 10 '22 00:11

Shai

Related questions
                            
                                Feature selection using scikit-learn
                            
                                sklearn metrics for multiclass classification
                            
                                Fitting data vs. transforming data in scikit-learn
                            
                                How to calculate optimal batch size
                            
                                What is the difference between Q-learning and Value Iteration?
                            
                                Comparing R to Matlab for Data Mining
                            
                                SVM and Neural Network
                            
                                Differences in SciKit Learn, Keras, or Pytorch [closed]
                            
                                Why rotation-invariant neural networks are not used in winners of the popular competitions?
                            
                                Machine Learning : Tensorflow v/s Tensorflow.js v/s Brain.js [closed]
                            
                                How to understand loss acc val_loss val_acc in Keras model fitting
                            
                                Linear Regression :: Normalization (Vs) Standardization
                            
                                Keras: weighted binary crossentropy
                            
                                Sklearn StratifiedKFold: ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead
                            
                                What is the meaning of the "None" in model.summary of KERAS?
                            
                                What is a multi-headed model? And what exactly is a 'head' in a model?
                            
                                Candidate Elimination Algorithm
                            
                                Determining the most contributing features for SVM classifier in sklearn
                            
                                scikit-learn return value of LogisticRegression.predict_proba
                            
                                What is "metrics" in Keras?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With