I just try to find out how I can use Caffe. To do so, I just took a look at the different .prototxt
files in the examples folder. There is one option I don't understand:
# The learning rate policy lr_policy: "inv"
Possible values seem to be:
"fixed"
"inv"
"step"
"multistep"
"stepearly"
"poly"
Could somebody please explain those options?
A solver is a routine for finding exact numerical answers for determined systems. For example, when using Newton-Raphson to find root(s). When a system is overdetermined then one generally uses approximate solutions, for example, regression.
SGD. Stochastic gradient descent ( type: "SGD" ) updates the weights by a linear combination of the negative gradient and the previous weight update . The learning rate is the weight of the negative gradient. The momentum is the weight of the previous update.
It is a common practice to decrease the learning rate (lr) as the optimization/learning process progresses. However, it is not clear how exactly the learning rate should be decreased as a function of the iteration number.
If you use DIGITS as an interface to Caffe, you will be able to visually see how the different choices affect the learning rate.
fixed: the learning rate is kept fixed throughout the learning process.
inv: the learning rate is decaying as ~1/T
step: the learning rate is piecewise constant, dropping every X iterations
multistep: piecewise constant at arbitrary intervals
You can see exactly how the learning rate is computed in the function SGDSolver<Dtype>::GetLearningRate
(solvers/sgd_solver.cpp line ~30).
Recently, I came across an interesting and unconventional approach to learning-rate tuning: Leslie N. Smith's work "No More Pesky Learning Rate Guessing Games". In his report, Leslie suggests to use lr_policy
that alternates between decreasing and increasing the learning rate. His work also suggests how to implement this policy in Caffe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With