What does kernel_constraint=max_norm(3) do?

Tags:

In one of the tutorials I am working on (link given below), the author outlines the baseline neural network structure as:

Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function and a weight constraint of max norm set to 3.

model.add(Conv2D(32, (3, 3), input_shape=(3, 32, 32), padding='same', activation='relu', kernel_constraint=maxnorm(3)))

What does weight constraint of max norm mean and do to the Conv layer? (We are using Keras.)

https://machinelearningmastery.com/object-recognition-convolutional-neural-networks-keras-deep-learning-library/

Thank you!

930

asked Aug 30 '17 22:08

aztec242

1 Answers

What does a weight constraint of max_normdo?

maxnorm(m) will, if the L2-Norm of your weights exceeds m, scale your whole weight matrix by a factor that reduces the norm to m. As you can find in the keras code in class MaxNorm(Constraint):

Now source code in the tensorflow.

def __call__(self, w):
    norms = K.sqrt(K.sum(K.square(w), axis=self.axis, keepdims=True))
    desired = K.clip(norms, 0, self.max_value)
    w *= (desired / (K.epsilon() + norms))
    return w

Aditionally, maxnorm has an axis argument, along which the norm is calculated. In your example you don't specify an axis, thus the norm is calculated over the whole weight matrix. If for example, you want to constrain the norm of every convolutional filter, assuming that you are using tf dimension ordering, the weight matrix will have the shape (rows, cols, input_depth, output_depth). Calculating the norm over axis = [0, 1, 2] will constrain each filter to the given norm.

Why to do it?

Constraining the weight matrix directly is another kind of regularization. If you use a simple L2 regularization term you penalize high weights with your loss function. With this constraint, you regularize directly. As also linked in the keras code, this seems to work especially well in combination with a dropoutlayer. More more info see chapter 5.1 in this paper

176

answered Sep 23 '22 11:09

McLawrence

Related questions
                            
                                SGD with momentum in TensorFlow
                            
                                Will PHP be slower if we add too many comments in code files? [duplicate]
                            
                                Why is my coldfusion code 100x faster now on string processing?
                            
                                Which is faster — sorting or multiplying a small array of elements?
                            
                                Use Hints for views?
                            
                                Why explicitly state "inline" in C++ [duplicate]
                            
                                Finding all the points within a circle in 2D space
                            
                                No llvm opt command in Mavericks
                            
                                How do I unroll (compile) an interpreter loop?
                            
                                Can we optimize code to reduce power consumption?
                            
                                Is a C++ optimizer allowed to move statements across a function call?
                            
                                C++: optimizing function with no side effects
                            
                                Quicker way to read single column of CSV file
                            
                                Forcing the .NET JIT compiler to generate the most optimized code during application start-up
                            
                                Which is better/more efficient: check for bad values or catch Exceptions in Java
                            
                                How to optimize this short factorial function in scala? (Creating 50000 BigInts)
                            
                                Will removing white space and newline characters from PHP make it run faster?
                            
                                Benchmarking VBA Code
                            
                                Fastest way to compute distance squared
                            
                                Optimized Bubble Sort

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What does kernel_constraint=max_norm(3) do?

Tags:

optimization

computer-vision

keras

conv-neural-network

convolution

aztec242

People also ask

1 Answers

McLawrence

Recent Activity

Donate For Us