Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does kernel_constraint=max_norm(3) do?

In one of the tutorials I am working on (link given below), the author outlines the baseline neural network structure as:

Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function and a weight constraint of max norm set to 3.

model.add(Conv2D(32, (3, 3), input_shape=(3, 32, 32), padding='same', activation='relu', kernel_constraint=maxnorm(3)))

What does weight constraint of max norm mean and do to the Conv layer? (We are using Keras.)

https://machinelearningmastery.com/object-recognition-convolutional-neural-networks-keras-deep-learning-library/

Thank you!

like image 930
aztec242 Avatar asked Aug 30 '17 22:08

aztec242


People also ask

What is Maxnorm?

Max-norm regularization is a regularization technique that constrains the weights of a neural network. The constraint imposed on the network by max-norm regularization is simple. The weight vector associated with each neuron is forced to have an \ell_2 norm of at most r, where r is a hyperparameter.

What is weight constraint?

A weight constraint is an update to the network that checks the size of the weights, and if the size exceeds a predefined limit, the weights are rescaled so that their size is below the limit or between a range.

What is kernel constraint keras?

keras. constraints module allow setting constraints (eg. non-negativity) on model parameters during training. They are per-variable projection functions applied to the target variable after each gradient update (when using fit() ).


1 Answers

What does a weight constraint of max_normdo?

maxnorm(m) will, if the L2-Norm of your weights exceeds m, scale your whole weight matrix by a factor that reduces the norm to m. As you can find in the keras code in class MaxNorm(Constraint):

Now source code in the tensorflow.

def __call__(self, w):
    norms = K.sqrt(K.sum(K.square(w), axis=self.axis, keepdims=True))
    desired = K.clip(norms, 0, self.max_value)
    w *= (desired / (K.epsilon() + norms))
    return w

Aditionally, maxnorm has an axis argument, along which the norm is calculated. In your example you don't specify an axis, thus the norm is calculated over the whole weight matrix. If for example, you want to constrain the norm of every convolutional filter, assuming that you are using tf dimension ordering, the weight matrix will have the shape (rows, cols, input_depth, output_depth). Calculating the norm over axis = [0, 1, 2] will constrain each filter to the given norm.

Why to do it?

Constraining the weight matrix directly is another kind of regularization. If you use a simple L2 regularization term you penalize high weights with your loss function. With this constraint, you regularize directly. As also linked in the keras code, this seems to work especially well in combination with a dropoutlayer. More more info see chapter 5.1 in this paper

like image 176
McLawrence Avatar answered Sep 23 '22 11:09

McLawrence