Looking at an example <code>'solver.prototxt'</code>, posted on BVLC/caffe git, there is a training meta parameter <pre class="prettyprint"><code>weight_decay: 0.04 </code></pre> What does this meta parameter mean? And what value should I assign to it?

The <code>weight_decay</code> meta parameter govern the regularization term of the neural net. During training a regularization term is added to the network's loss to compute the backprop gradient. The <code>weight_decay</code> value determines how dominant this regularization term will be in the gradient computation. As a rule of thumb, the more training examples you have, the weaker this term should be. The more parameters you have (i.e., deeper net, larger filters, larger InnerProduct layers etc.) the higher this term should be. Caffe also allows you to choose between <code>L2</code> regularization (default) and <code>L1</code> regularization, by setting <pre class="prettyprint"><code>regularization_type: "L1" </code></pre> However, since in most cases weights are small numbers (i.e., <code>-1<w<1</code>), the <code>L2</code> norm of the weights is significantly smaller than their <code>L1</code> norm. Thus, if you choose to use <code>regularization_type: "L1"</code> you might need to tune <code>weight_decay</code> to a significantly smaller value. While learning rate may (and usually does) change during training, the regularization weight is fixed throughout.

Weight decay is a regularization term that penalizes big weights. When the weight decay coefficient is big the penalty for big weights is also big, when it is small weights can freely grow. Look at this answer (not specific to caffe) for a better explanation: Difference between neural net "weight decay" and "learning rate".

What is `weight_decay` meta parameter in Caffe?

Tags:

Looking at an example 'solver.prototxt', posted on BVLC/caffe git, there is a training meta parameter

weight_decay: 0.04

What does this meta parameter mean? And what value should I assign to it?

388

asked Aug 24 '15 08:08

Shai

2 Answers

The weight_decay meta parameter govern the regularization term of the neural net.

During training a regularization term is added to the network's loss to compute the backprop gradient. The weight_decay value determines how dominant this regularization term will be in the gradient computation.

As a rule of thumb, the more training examples you have, the weaker this term should be. The more parameters you have (i.e., deeper net, larger filters, larger InnerProduct layers etc.) the higher this term should be.

Caffe also allows you to choose between L2 regularization (default) and L1 regularization, by setting

regularization_type: "L1"

However, since in most cases weights are small numbers (i.e., -1<w<1), the L2 norm of the weights is significantly smaller than their L1 norm. Thus, if you choose to use regularization_type: "L1" you might need to tune weight_decay to a significantly smaller value.

While learning rate may (and usually does) change during training, the regularization weight is fixed throughout.

118

answered Sep 20 '22 21:09

Shai

Weight decay is a regularization term that penalizes big weights. When the weight decay coefficient is big the penalty for big weights is also big, when it is small weights can freely grow.

Look at this answer (not specific to caffe) for a better explanation: Difference between neural net "weight decay" and "learning rate".

answered Sep 20 '22 21:09

Tal Darom

Related questions
                            
                                JUnit: NoClassDefFoundError: org/junit/runner/manipulation/Filter
                            
                                How to delete last path component of a String in Swift?
                            
                                NetBeans 8.1 Activation Failure
                            
                                swift invalidate timer doesn't work
                            
                                Import CSV file to Laravel controller and insert data to two tables
                            
                                How to create a custom deserializer in Jackson for a generic type?
                            
                                How to make a UIView have an effect of transparent gradient?
                            
                                Difference between System.getProperty("line.separator"); and "\n"?
                            
                                Twig - How to check if variable is a number / integer
                            
                                How to find items using regex in Mongoose [duplicate]
                            
                                React Native - how to set fixed width to individual characters (number, letter, etc)
                            
                                Proper way to re-initialize the data in VueJs 2.0

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is `weight_decay` meta parameter in Caffe?

Tags:

Shai

People also ask

2 Answers

Shai

Tal Darom

Recent Activity

Donate For Us