I'm trying to use Keras to implement part of an algorithm that requires weight clipping, i.e. limiting the weight values after a gradient update. I haven't found any solutions through web searches so far.
For background, this has to do with the WGANs algorithm:
https://arxiv.org/pdf/1701.07875.pdf
If you look at algorithm 1 on page 8, you'll see the following:
I've highlighted the lines that I'm trying to implement in Keras: after computing a gradient to use to update the weights in the network, I want to make sure that all the weights are clipped between some values [-c, c] that I can set.
How could I go about doing this in Keras?
For reference I am using the TensorFlow backend. I don't mind digging into things and adding messy quick-fixes for now.
Applying gradient clipping in TensorFlow models is quite straightforward. The only thing you need to do is pass the parameter to the optimizer function. All optimizers have a `clipnorm` and a `clipvalue` parameters that can be used to clip the gradients.
If a gradient exceeds some threshold value, we clip that gradient to the threshold. If the gradient is less than the lower limit then we clip that too, to the lower limit of the threshold.
Although the gradient introduces a bias in the resulting values, gradient clipping can keep things stable. It can be difficult to train recurrent neural networks. Vanishing gradients and exploding gradients are two common problems when training recurrent neural networks.
Vanishing gradients can happen when optimization gets stuck at a certain point because the gradient is too small to progress. Gradient clipping can prevent these issues in the gradients that mess up the parameters during training.
While creating the optimizer object set param clipvalue
. It will do precisely what you want.
# all parameter gradients will be clipped to
# a maximum value of 0.5 and
# a minimum value of -0.5.
rsmprop = RMSprop(clipvalue=0.5)
and then use this object to for model compiling
model.compile(loss='mse', optimizer=rsmprop)
For more reference check: here.
Also, I prefer to use clipnorm
over clipvalue
because with clipnorm
the optimization remains stable. For example say you have 2 parameters and the gradients came out to be [0.1, 3]
. By using clipvalue
the gradients will become [0.1, 0.5] ie there are chances that the direction of steepest decent can get changed drastically. While clipnorm
don't have similar problem as all the gradients will be appropriately scaled and the direction will be preserved and all the while ensuring the constraint on the magnitude of the gradient.
Edit: The question asks weights clipping not gradient clipping:
Gradiant clipping on weights is not part of keras code. But maxnorm
on weights constraints is. Check here.
Having said that it can be easily implemented. Here is a very small example:
from keras.constraints import Constraint
from keras import backend as K
class WeightClip(Constraint):
'''Clips the weights incident to each hidden unit to be inside a range
'''
def __init__(self, c=2):
self.c = c
def __call__(self, p):
return K.clip(p, -self.c, self.c)
def get_config(self):
return {'name': self.__class__.__name__,
'c': self.c}
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(30, input_dim=100, W_constraint = WeightClip(2)))
model.add(Dense(1))
model.compile(loss='mse', optimizer='rmsprop')
X = np.random.random((1000,100))
Y = np.random.random((1000,1))
model.fit(X,Y)
I have tested the running of the above code, but not the validity of the constraints. You can do so by getting the model weights after training using model.get_weights()
or model.layers[idx].get_weights()
and checking whether its abiding the constraints.
Note: The constrain is not added to all the model weights .. but just to the weights of the specific layer its used and also W_constraint
adds constrain to W
param and b_constraint
to b
(bias) param
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With