Should I avoid to use L2 regularization in conjuntion with RMSprop and NAG?
The L2 regularization term interferes with the gradient algorithm (RMSprop)?
Best reggards,
Elastic Net: When L1 and L2 regularization combine together, it becomes the elastic net method, it adds a hyperparameter.
From a practical standpoint, L1 tends to shrink coefficients to zero whereas L2 tends to shrink coefficients evenly. L1 is therefore useful for feature selection, as we can drop any variables associated with coefficients that go to zero. L2, on the other hand, is useful when you have collinear/codependent features.
L1 regularization is more robust than L2 regularization for a fairly obvious reason. L2 regularization takes the square of the weights, so the cost of outliers present in the data increases exponentially. L1 regularization takes the absolute values of the weights, so the cost only increases linearly.
Briefly, L2 regularization works by adding a term to the error function used by the training algorithm. The additional term penalizes large weight values. The two most common error functions used in neural network training are squared error and cross entropy error.
Seems that someone have sorted out (2018) the question (2017).
Vanilla adaptive gradients (RMSProp, Adagrad, Adam, etc) do not match well with L2 regularization.
Link to the paper [https://arxiv.org/pdf/1711.05101.pdf] and some intro:
In this paper, we show that a major factor of the poor generalization of the most popular adaptive gradient method, Adam, is due to the fact that L2 regularization is not nearly as effective for it as for SGD.
L2 regularization and weight decay are not identical. Contrary to common belief, the two techniques are not equivalent. For SGD, they can be made equivalent by a reparameterization of the weight decay factor based on the learning rate; this is not the case for Adam. In particular, when combined with adaptive gradients, L2 regularization leads to weights with large gradients being regularized less than they would be when using weight decay.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With