I am attempting to create a custom, Dense layer in Keras to tie weights in an Autoencoder. I have tried following an example for doing this in convolutional layers here, but it seemed like some of the steps did not apply for the Dense layer (also, the code is from over two years ago).
By tying weights, I want the decode layer to use the transposed weight matrix of the encode layer. This approach is also taken in this article (page 5). Below is the relevant quote from the article:
Here, we choose both the encoding and decoding activation function to be sigmoid function and only consider the tied weights case, in which W ′ = WT (where WT is the transpose of W ) as most existing deep learning methods do.
In the quote above, W is the weight matrix in the encode layer and W' (equal to the transpose of W) is the weight matrix in the decode layer.
I did not change too much in the dense layer. I added a tied_to
parameter to the constructor, which allows you to pass the layer you want to tie it to. The only other change was to the build
function, the snippet for this is below:
def build(self, input_shape):
assert len(input_shape) >= 2
input_dim = input_shape[-1]
if self.tied_to is not None:
self.kernel = K.transpose(self.tied_to.kernel)
self._non_trainable_weights.append(self.kernel)
else:
self.kernel = self.add_weight(shape=(input_dim, self.units),
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
if self.use_bias:
self.bias = self.add_weight(shape=(self.units,),
initializer=self.bias_initializer,
name='bias',
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
else:
self.bias = None
self.input_spec = InputSpec(min_ndim=2, axes={-1: input_dim})
self.built = True
Below is the __init__
method, the only change here was the addition of the tied_to
parameter.
def __init__(self, units,
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
tied_to=None,
**kwargs):
if 'input_shape' not in kwargs and 'input_dim' in kwargs:
kwargs['input_shape'] = (kwargs.pop('input_dim'),)
super(Dense, self).__init__(**kwargs)
self.units = units
self.activation = activations.get(activation)
self.use_bias = use_bias
self.kernel_initializer = initializers.get(kernel_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_regularizer = regularizers.get(kernel_regularizer)
self.bias_regularizer = regularizers.get(bias_regularizer)
self.activity_regularizer = regularizers.get(activity_regularizer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.bias_constraint = constraints.get(bias_constraint)
self.input_spec = InputSpec(min_ndim=2)
self.supports_masking = True
self.tied_to = tied_to
The call
function was not edited, but it is below for reference.
def call(self, inputs):
output = K.dot(inputs, self.kernel)
if self.use_bias:
output = K.bias_add(output, self.bias, data_format='channels_last')
if self.activation is not None:
output = self.activation(output)
return output
Above, I added a conditional to check if the tied_to
parameter was set, and if so, set the layer's kernel to the transpose of the tied_to
layer's kernel.
Below is the code used to instantiate the model. It is done using Keras's sequential API and DenseTied
is my custom layer.
# encoder
#
encoded1 = Dense(2, activation="sigmoid")
decoded1 = DenseTied(4, activation="sigmoid", tied_to=encoded1)
# autoencoder
#
autoencoder = Sequential()
autoencoder.add(encoded1)
autoencoder.add(decoded1)
After training the model, below is the model summary and weights.
autoencoder.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_7 (Dense) (None, 2) 10
_________________________________________________________________
dense_tied_7 (DenseTied) (None, 4) 12
=================================================================
Total params: 22
Trainable params: 14
Non-trainable params: 8
________________________________________________________________
autoencoder.layers[0].get_weights()[0]
array([[-2.122982 , 0.43029135],
[-2.1772149 , 0.16689162],
[-1.0465667 , 0.9828905 ],
[-0.6830663 , 0.0512633 ]], dtype=float32)
autoencoder.layers[-1].get_weights()[1]
array([[-0.6521988 , -0.7131109 , 0.14814234, 0.26533198],
[ 0.04387903, -0.22077179, 0.517225 , -0.21583867]],
dtype=float32)
As you can see, the weights reported by autoencoder.get_weights()
do not seem to be tied.
So after showing my approach, my question is, is this a valid way to tie weights in a Dense Keras layer? I was able to run the code, and it is currently training. It seems that the loss function is decreasing reasonably as well. My fear is that this will only set them equal when the model is build, but not actually tie them. My hope is that the backend transpose
function is tying them through references under the hood, but I am sure that I am missing something.
Implementation of Tying Weights: To implement tying weights, we need to create a custom layer to tie weights between the layer using keras. This custom layer acts as a regular dense layer, but it uses the transposed weights of the encoder's dense layer, however having its own bias vector.
Tying weights 101 An autoencoder with tied weights has decoder weights that are the transpose of the encoder weights; this is a form of parameter sharing, which reduces the number of parameters of the model.
Bottleneck: It is the lower dimensional hidden layer where the encoding is produced. The bottleneck layer has a lower number of nodes and the number of nodes in the bottleneck layer also gives the dimension of the encoding of the input.
5. When should we not use autoencoders? An autoencoder could misclassify input errors that are different from those in the training set or changes in underlying relationships that a human would notice. Another drawback is you may eliminate the vital information in the input data.
So after showing my approach, my question is, is this a valid way to tie weights in a Dense Keras layer?
Yes, it's valid.
My fear is that this will only set them equal when the model is build, but not actually tie them. My hope is that the backend transpose function is tying them through references under the hood, but I am sure that I am missing something.
It actually ties them in a computation graph, you can check in printing model.summary()
that there's just one copy of these trainable weights. Also, after training your model you can check weights of corresponding layers with model.get_weights()
. When the model is build there're no weights yet actually, just placeholders for them.
random.seed(1)
class DenseTied(Layer):
def __init__(self, units,
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
tied_to=None,
**kwargs):
self.tied_to = tied_to
if 'input_shape' not in kwargs and 'input_dim' in kwargs:
kwargs['input_shape'] = (kwargs.pop('input_dim'),)
super().__init__(**kwargs)
self.units = units
self.activation = activations.get(activation)
self.use_bias = use_bias
self.kernel_initializer = initializers.get(kernel_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_regularizer = regularizers.get(kernel_regularizer)
self.bias_regularizer = regularizers.get(bias_regularizer)
self.activity_regularizer = regularizers.get(activity_regularizer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.bias_constraint = constraints.get(bias_constraint)
self.input_spec = InputSpec(min_ndim=2)
self.supports_masking = True
def build(self, input_shape):
assert len(input_shape) >= 2
input_dim = input_shape[-1]
if self.tied_to is not None:
self.kernel = K.transpose(self.tied_to.kernel)
self._non_trainable_weights.append(self.kernel)
else:
self.kernel = self.add_weight(shape=(input_dim, self.units),
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
if self.use_bias:
self.bias = self.add_weight(shape=(self.units,),
initializer=self.bias_initializer,
name='bias',
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
else:
self.bias = None
self.built = True
def compute_output_shape(self, input_shape):
assert input_shape and len(input_shape) >= 2
assert input_shape[-1] == self.units
output_shape = list(input_shape)
output_shape[-1] = self.units
return tuple(output_shape)
def call(self, inputs):
output = K.dot(inputs, self.kernel)
if self.use_bias:
output = K.bias_add(output, self.bias, data_format='channels_last')
if self.activation is not None:
output = self.activation(output)
return output
# input_ = Input(shape=(16,), dtype=np.float32)
# encoder
#
encoded1 = Dense(4, activation="sigmoid", input_shape=(4,), use_bias=True)
decoded1 = DenseTied(4, activation="sigmoid", tied_to=encoded1, use_bias=False)
# autoencoder
#
autoencoder = Sequential()
# autoencoder.add(input_)
autoencoder.add(encoded1)
autoencoder.add(decoded1)
autoencoder.compile(optimizer="adam", loss="binary_crossentropy")
print(autoencoder.summary())
autoencoder.fit(x=np.random.rand(100, 4), y=np.random.randint(0, 1, size=(100, 4)))
print(autoencoder.layers[0].get_weights()[0])
print(autoencoder.layers[1].get_weights()[0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With