I'm trying to use TensorFlow with my deep learning project.
When I use Momentum Gradient Descent, how is the weight cost strength set?
(The λ in this formula.)
The term for the weight cost/decay is not part of the optimizers in TensorFlow.
It is easy to include, however, by adding the extra penalty to the cost function with the L2 loss on the weights:
C = <your initial cost function>
l2_loss = tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables()])
C = C + lambda * l2_loss
tf.nn.l2_loss(v)
link is simply 0.5 * tf.reduce_sum(v * v)
and the gradients for individual weights will be equal to lambda * w
, which should be equivalent to your linked equation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With