I have noticed that weight_regularizer is no more available in Keras and that, in its place, there are activity and kernel regularizer. I would like to know:
Regularization is the process of fine-tuning neural network models by inducing a penalty term in the error parameter to obtain an optimal and reliable model which converges better with minimal loss during testing and performs better for unseen data.
Activity regularization is specified on a layer in Keras. This can be achieved by setting the activity_regularizer argument on the layer to an instantiated and configured regularizer class. The regularizer is applied to the output of the layer, but you have control over what the “output” of the layer actually means.
Kernel regularizer penalizes the layer's kernel(weight) but does not penalize bias. Only the bias of the layer is penalized by a bias regularizer. The activity regularizer penalizes the output of the layer.
To add a regularizer to a layer, you simply have to pass in the prefered regularization technique to the layer's keyword argument 'kernel_regularizer'. The Keras regularization implementation methods can provide a parameter that represents the regularization hyperparameter value.
The activity regularizer works as a function of the output of the net, and is mostly used to regularize hidden units, while weight_regularizer, as the name says, works on the weights (e.g. making them decay). Basically you can express the regularization loss as a function of the output (activity_regularizer
) or of the weights (weight_regularizer
).
The new kernel_regularizer
replaces weight_regularizer
- although it's not very clear from the documentation.
From the definition of kernel_regularizer
:
kernel_regularizer: Regularizer function applied to the
kernel
weights matrix (see regularizer).
And activity_regularizer
:
activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
Important Edit: Note that there is a bug in the activity_regularizer that was only fixed in version 2.1.4 of Keras (at least with Tensorflow backend). Indeed, in the older versions, the activity regularizer function is applied to the input of the layer, instead of being applied to the output (the actual activations of the layer, as intended). So beware if you are using an older version of Keras (before 2.1.4), activity regularization may probably not work as intended.
You can see the commit on GitHub
Five months ago François Chollet provided a fix to the activity regularizer, that was then included in Keras 2.1.4
This answer is a bit late, but is useful for the future readers.
So, necessity is the mother of invention as they say. I only understood it when I needed it.
The above answer doesn't really state the difference cause both of them end up affecting the weights, so what's the difference between punishing for the weights themselves or the output of the layer?
Here is the answer: I encountered a case where the weights of the net are small and nice, ranging between [-0.3] to [+0.3].
So, I really can't punish them, there is nothing wrong with them. A kernel regularizer is useless. However, the output of the layer is HUGE, in 100's.
Keep in mind that the input to the layer is also small, always less than one. But those small values interact with the weights in such a way that produces those massive outputs. Here I realized that what I need is an activity regularizer, rather than kernel regularizer. With this, I'm punishing the layer for those large outputs, I don't care if the weights themselves are small, I just want to deter it from reaching such state cause this saturates my sigmoid activation and causes tons of other troubles like vanishing gradient and stagnation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With