In one of the tutorials I am working on (link given below), the author outlines the baseline neural network structure as:
Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function and a weight constraint of max norm set to 3.
model.add(Conv2D(32, (3, 3), input_shape=(3, 32, 32), padding='same', activation='relu', kernel_constraint=maxnorm(3)))
What does weight constraint of max norm mean and do to the Conv layer? (We are using Keras.)
https://machinelearningmastery.com/object-recognition-convolutional-neural-networks-keras-deep-learning-library/
Thank you!
Max-norm regularization is a regularization technique that constrains the weights of a neural network. The constraint imposed on the network by max-norm regularization is simple. The weight vector associated with each neuron is forced to have an \ell_2 norm of at most r, where r is a hyperparameter.
A weight constraint is an update to the network that checks the size of the weights, and if the size exceeds a predefined limit, the weights are rescaled so that their size is below the limit or between a range.
keras. constraints module allow setting constraints (eg. non-negativity) on model parameters during training. They are per-variable projection functions applied to the target variable after each gradient update (when using fit() ).
What does a weight constraint of max_norm
do?
maxnorm(m)
will, if the L2-Norm of your weights exceeds m
, scale your whole weight matrix by a factor that reduces the norm to m
.
As you can find in the keras code in class MaxNorm(Constraint)
:
Now source code in the tensorflow.
def __call__(self, w):
norms = K.sqrt(K.sum(K.square(w), axis=self.axis, keepdims=True))
desired = K.clip(norms, 0, self.max_value)
w *= (desired / (K.epsilon() + norms))
return w
Aditionally, maxnorm
has an axis
argument, along which the norm is calculated. In your example you don't specify an axis, thus the norm is calculated over the whole weight matrix. If for example, you want to constrain the norm of every convolutional filter, assuming that you are using tf
dimension ordering, the weight matrix will have the shape (rows, cols, input_depth, output_depth)
. Calculating the norm over axis = [0, 1, 2]
will constrain each filter to the given norm.
Why to do it?
Constraining the weight matrix directly is another kind of regularization. If you use a simple L2 regularization term you penalize high weights with your loss function. With this constraint, you regularize directly.
As also linked in the keras
code, this seems to work especially well in combination with a dropout
layer. More more info see chapter 5.1 in this paper
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With