How can I implement max norm constraints in an MLP in tensorflow?

Tags:

How can I implement max norm constraints on the weights in an MLP in tensorflow? The kind that Hinton and Dean describe in their work on dark knowledge. That is, does tf.nn.dropout implement the weight constraints by default, or do we need to do it explicitly, as in

https://arxiv.org/pdf/1207.0580.pdf

"If these networks share the same weights for the hidden units that are present. We use the standard, stochastic gradient descent procedure for training the dropout neural networks on mini-batches of training cases, but we modify the penalty term that is normally used to prevent the weights from growing too large. Instead of penalizing the squared length (L2 norm) of the whole weight vector, we set an upper bound on the L2 norm of the incoming weight vector for each individual hidden unit. If a weight-update violates this constraint, we renormalize the weights of the hidden unit by division."

Keras appears to have it

http://keras.io/constraints/

439

asked Jun 14 '16 02:06

Charles

1 Answers

tf.nn.dropout does not impose any norm constraint. I believe what you're looking for is to "process the gradients before applying them" using tf.clip_by_norm.

For example, instead of simply:

# Create an optimizer + implicitly call compute_gradients() and apply_gradients()
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

You could:

# Create an optimizer.
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
# Compute the gradients for a list of variables.
grads_and_vars = optimizer.compute_gradients(loss, [weights1, weights2, ...])
# grads_and_vars is a list of tuples (gradient, variable).
# Do whatever you need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(tf.clip_by_norm(gv[0], clip_norm=123.0, axes=0), gv[1])
                         for gv in grads_and_vars]
# Ask the optimizer to apply the capped gradients
optimizer = optimizer.apply_gradients(capped_grads_and_vars)

I hope this helps. Final notes about tf.clip_by_norm's axes parameter:

If you're calculating tf.nn.xw_plus_b(x, weights, biases), or equivalently matmul(x, weights) + biases, when the dimensions of x and weights are (batch, in_units) and (in_units, out_units) respectively, then you probably want to set axes == [0] (because in this usage each column details all incoming weights to a specific unit).
Pay attention to the shape/dimensions of your variables above and whether/how exactly you want to clip_by_norm each of them! E.g. if some of [weights1, weights2, ...] are matrices and some aren't, and you call clip_by_norm() on the grads_and_vars with the same axes value like in the List Comprehension above, this doesn't mean the same thing for all the variables! In fact, if you're lucky, this will result in a weird error like ValueError: Invalid reduction dimension 1 for input with 1 dimensions, but otherwise it's a very sneaky bug.

166

answered Oct 02 '22 16:10

Yaniv

Related questions
                            
                                When to use an iterator in Tensorflow Estimator
                            
                                Keras / Tensorflow: Predict Using tf.data.Dataset API
                            
                                Tensorflow model.fit() using a Dataset generator
                            
                                What is the difference between tf-nightly-gpu and tensorflow-gpu
                            
                                What are the Tensorflow qint8, quint8, qint32, qint16, and quint16 datatypes?
                            
                                Cannot Import Name 'keras_export' From 'tensorflow.python.util.tf_export'
                            
                                Does `tf.data.Dataset.take()` return random sample?
                            
                                Can I train a Tensorflow keras model with complex input/output?
                            
                                TypeError: 'NoneType' object is not callable Tensorflow
                            
                                Constraining a neural network's output to be within an arbitrary range
                            
                                Why does loading tensorflow on Mac lead to "Process finished with exit code 132 (interrupted by signal 4: SIGILL)"?
                            
                                TensorFlow cholesky decomposition
                            
                                TensorFlow initializing Tensor of ones
                            
                                skflow regression predict multiple values
                            
                                How can I execute a TensorFlow graph from a protobuf in C++?
                            
                                Tensorflow ArgumentError Running CIFAR-10 example
                            
                                TensorFlow Resize image tensor to dynamic shape
                            
                                Elegant Way to Select one Element per Row in Tensorflow
                            
                                How do I get TensorFlow's 'import_graph_def' to return Tensors
                            
                                How can I use intersphinx with Tensorflow and numpydoc?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I implement max norm constraints in an MLP in tensorflow?

Tags:

tensorflow

keras

Charles

People also ask

1 Answers

Yaniv

Recent Activity

Donate For Us