Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tying Autoencoder Weights in a Dense Keras Layer

I am attempting to create a custom, Dense layer in Keras to tie weights in an Autoencoder. I have tried following an example for doing this in convolutional layers here, but it seemed like some of the steps did not apply for the Dense layer (also, the code is from over two years ago).

By tying weights, I want the decode layer to use the transposed weight matrix of the encode layer. This approach is also taken in this article (page 5). Below is the relevant quote from the article:

Here, we choose both the encoding and decoding activation function to be sigmoid function and only consider the tied weights case, in which W ′ = WT (where WT is the transpose of W ) as most existing deep learning methods do.

In the quote above, W is the weight matrix in the encode layer and W' (equal to the transpose of W) is the weight matrix in the decode layer.

I did not change too much in the dense layer. I added a tied_to parameter to the constructor, which allows you to pass the layer you want to tie it to. The only other change was to the build function, the snippet for this is below:

def build(self, input_shape):
    assert len(input_shape) >= 2
    input_dim = input_shape[-1]

    if self.tied_to is not None:
        self.kernel = K.transpose(self.tied_to.kernel)
        self._non_trainable_weights.append(self.kernel)
    else:
        self.kernel = self.add_weight(shape=(input_dim, self.units),
                                      initializer=self.kernel_initializer,
                                      name='kernel',
                                      regularizer=self.kernel_regularizer,
                                      constraint=self.kernel_constraint)
    if self.use_bias:
        self.bias = self.add_weight(shape=(self.units,),
                                    initializer=self.bias_initializer,
                                    name='bias',
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
    else:
        self.bias = None
    self.input_spec = InputSpec(min_ndim=2, axes={-1: input_dim})
    self.built = True

Below is the __init__ method, the only change here was the addition of the tied_to parameter.

def __init__(self, units,
             activation=None,
             use_bias=True,
             kernel_initializer='glorot_uniform',
             bias_initializer='zeros',
             kernel_regularizer=None,
             bias_regularizer=None,
             activity_regularizer=None,
             kernel_constraint=None,
             bias_constraint=None,
             tied_to=None,
             **kwargs):
    if 'input_shape' not in kwargs and 'input_dim' in kwargs:
        kwargs['input_shape'] = (kwargs.pop('input_dim'),)
    super(Dense, self).__init__(**kwargs)
    self.units = units
    self.activation = activations.get(activation)
    self.use_bias = use_bias
    self.kernel_initializer = initializers.get(kernel_initializer)
    self.bias_initializer = initializers.get(bias_initializer)
    self.kernel_regularizer = regularizers.get(kernel_regularizer)
    self.bias_regularizer = regularizers.get(bias_regularizer)
    self.activity_regularizer = regularizers.get(activity_regularizer)
    self.kernel_constraint = constraints.get(kernel_constraint)
    self.bias_constraint = constraints.get(bias_constraint)
    self.input_spec = InputSpec(min_ndim=2)
    self.supports_masking = True
    self.tied_to = tied_to

The call function was not edited, but it is below for reference.

def call(self, inputs):
    output = K.dot(inputs, self.kernel)
    if self.use_bias:
        output = K.bias_add(output, self.bias, data_format='channels_last')
    if self.activation is not None:
        output = self.activation(output)
    return output

Above, I added a conditional to check if the tied_to parameter was set, and if so, set the layer's kernel to the transpose of the tied_to layer's kernel.

Below is the code used to instantiate the model. It is done using Keras's sequential API and DenseTied is my custom layer.

# encoder
#
encoded1 = Dense(2, activation="sigmoid")

decoded1 = DenseTied(4, activation="sigmoid", tied_to=encoded1)

# autoencoder
#
autoencoder = Sequential()
autoencoder.add(encoded1)
autoencoder.add(decoded1)

After training the model, below is the model summary and weights.

autoencoder.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_7 (Dense)              (None, 2)                 10        
_________________________________________________________________
dense_tied_7 (DenseTied)     (None, 4)                 12        
=================================================================
Total params: 22
Trainable params: 14
Non-trainable params: 8
________________________________________________________________

autoencoder.layers[0].get_weights()[0]
array([[-2.122982  ,  0.43029135],
       [-2.1772149 ,  0.16689162],
       [-1.0465667 ,  0.9828905 ],
       [-0.6830663 ,  0.0512633 ]], dtype=float32)


autoencoder.layers[-1].get_weights()[1]
array([[-0.6521988 , -0.7131109 ,  0.14814234,  0.26533198],
       [ 0.04387903, -0.22077179,  0.517225  , -0.21583867]],
      dtype=float32)

As you can see, the weights reported by autoencoder.get_weights() do not seem to be tied.

So after showing my approach, my question is, is this a valid way to tie weights in a Dense Keras layer? I was able to run the code, and it is currently training. It seems that the loss function is decreasing reasonably as well. My fear is that this will only set them equal when the model is build, but not actually tie them. My hope is that the backend transpose function is tying them through references under the hood, but I am sure that I am missing something.

like image 235
James Mchugh Avatar asked Dec 12 '18 20:12

James Mchugh


People also ask

How do you tie weights in a stacked autoencoder?

Implementation of Tying Weights: To implement tying weights, we need to create a custom layer to tie weights between the layer using keras. This custom layer acts as a regular dense layer, but it uses the transposed weights of the encoder's dense layer, however having its own bias vector.

What is tying the weights in an autoencoder?

Tying weights 101 An autoencoder with tied weights has decoder weights that are the transpose of the encoder weights; this is a form of parameter sharing, which reduces the number of parameters of the model.

What is bottleneck in autoencoder?

Bottleneck: It is the lower dimensional hidden layer where the encoding is produced. The bottleneck layer has a lower number of nodes and the number of nodes in the bottleneck layer also gives the dimension of the encoding of the input.

When should we not use autoencoder?

5. When should we not use autoencoders? An autoencoder could misclassify input errors that are different from those in the training set or changes in underlying relationships that a human would notice. Another drawback is you may eliminate the vital information in the input data.


1 Answers

So after showing my approach, my question is, is this a valid way to tie weights in a Dense Keras layer?

Yes, it's valid.

My fear is that this will only set them equal when the model is build, but not actually tie them. My hope is that the backend transpose function is tying them through references under the hood, but I am sure that I am missing something.

It actually ties them in a computation graph, you can check in printing model.summary() that there's just one copy of these trainable weights. Also, after training your model you can check weights of corresponding layers with model.get_weights(). When the model is build there're no weights yet actually, just placeholders for them.

random.seed(1)

class DenseTied(Layer):
    def __init__(self, units,
                 activation=None,
                 use_bias=True,
                 kernel_initializer='glorot_uniform',
                 bias_initializer='zeros',
                 kernel_regularizer=None,
                 bias_regularizer=None,
                 activity_regularizer=None,
                 kernel_constraint=None,
                 bias_constraint=None,
                 tied_to=None,
                 **kwargs):
        self.tied_to = tied_to
        if 'input_shape' not in kwargs and 'input_dim' in kwargs:
            kwargs['input_shape'] = (kwargs.pop('input_dim'),)
        super().__init__(**kwargs)
        self.units = units
        self.activation = activations.get(activation)
        self.use_bias = use_bias
        self.kernel_initializer = initializers.get(kernel_initializer)
        self.bias_initializer = initializers.get(bias_initializer)
        self.kernel_regularizer = regularizers.get(kernel_regularizer)
        self.bias_regularizer = regularizers.get(bias_regularizer)
        self.activity_regularizer = regularizers.get(activity_regularizer)
        self.kernel_constraint = constraints.get(kernel_constraint)
        self.bias_constraint = constraints.get(bias_constraint)
        self.input_spec = InputSpec(min_ndim=2)
        self.supports_masking = True

    def build(self, input_shape):
        assert len(input_shape) >= 2
        input_dim = input_shape[-1]

        if self.tied_to is not None:
            self.kernel = K.transpose(self.tied_to.kernel)
            self._non_trainable_weights.append(self.kernel)
        else:
            self.kernel = self.add_weight(shape=(input_dim, self.units),
                                          initializer=self.kernel_initializer,
                                          name='kernel',
                                          regularizer=self.kernel_regularizer,
                                          constraint=self.kernel_constraint)
        if self.use_bias:
            self.bias = self.add_weight(shape=(self.units,),
                                        initializer=self.bias_initializer,
                                        name='bias',
                                        regularizer=self.bias_regularizer,
                                        constraint=self.bias_constraint)
        else:
            self.bias = None

        self.built = True

    def compute_output_shape(self, input_shape):
        assert input_shape and len(input_shape) >= 2
        assert input_shape[-1] == self.units
        output_shape = list(input_shape)
        output_shape[-1] = self.units
        return tuple(output_shape)

    def call(self, inputs):
        output = K.dot(inputs, self.kernel)
        if self.use_bias:
            output = K.bias_add(output, self.bias, data_format='channels_last')
        if self.activation is not None:
            output = self.activation(output)
        return output


# input_ = Input(shape=(16,), dtype=np.float32)
# encoder
#
encoded1 = Dense(4, activation="sigmoid", input_shape=(4,), use_bias=True)
decoded1 = DenseTied(4, activation="sigmoid", tied_to=encoded1, use_bias=False)

# autoencoder
#
autoencoder = Sequential()
# autoencoder.add(input_)
autoencoder.add(encoded1)
autoencoder.add(decoded1)

autoencoder.compile(optimizer="adam", loss="binary_crossentropy")

print(autoencoder.summary())

autoencoder.fit(x=np.random.rand(100, 4), y=np.random.randint(0, 1, size=(100, 4)))

print(autoencoder.layers[0].get_weights()[0])
print(autoencoder.layers[1].get_weights()[0])
like image 160
Mikhail Berlinkov Avatar answered Sep 22 '22 01:09

Mikhail Berlinkov