Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tied weights in Autoencoder

I have been looking at autoencoders and have been wondering whether to used tied weights or not. I intend on stacking them as a pretraining step and then using their hidden representations to feed a NN.

Using untied weights it would look like:

f(x)=σ2(b2+W21(b1+W1*x))

Using tied weights it would look like:

f(x)=σ2(b2+W1T1(b1+W1*x))

From a very simplistic view, could one say that tying the weights ensures that encoder part is generating the best representation given the architecture versus if the weights were independent then decoder could effectively take a non-optimal representation and still decode it?

I ask because if the decoder is where the "magic" occurs and I intend to only use the encoder to drive my NN, wouldn't that be problematic.

like image 251
Paul O Avatar asked Apr 27 '16 12:04

Paul O


People also ask

What is bottleneck in autoencoder?

Bottleneck: It is the lower dimensional hidden layer where the encoding is produced. The bottleneck layer has a lower number of nodes and the number of nodes in the bottleneck layer also gives the dimension of the encoding of the input.

Why should you use the transpose weights of encoder or decoder weights in auto encoder?

The weight matrix of the decoding stage is the transpose of weight matrix of the encoding stage in order to reduce the number of parameters to learn. We want to optimize W , b , and b so that the reconstruction is as similar to the original input as possible with respect to some loss function.

How do I increase the accuracy of my autoencoder?

Autoencoder can improve learning accuracy with regularization, which can be a sparsity regularizer, either a contractive regularizer [5], or a denoising form of regularization [6]. Recent work [7] has shown that regularization can be used to prevent feature co-adaptation by dropout training.

What is sparsity in autoencoder?

A Sparse Autoencoder is a type of autoencoder that employs sparsity to achieve an information bottleneck. Specifically the loss function is constructed so that activations are penalized within a layer.


1 Answers

Autoencoders with tied weights have some important advantages :

  1. It's easier to learn.
  2. In linear case it's equvialent to PCA - this may lead to more geometrically adequate coding.
  3. Tied weights are sort of regularisation.

But of course - they're not perfect : they may not be optimal when your data comes from highly nolinear manifold. Depending on size of your data I would try both approaches - with tied weights and not if it's possible.

UPDATE :

You asked also why representation which comes from autoencoder with tight weights might be better than one without. Of course it's not the case that such representation is always better but if the reconstruction error is sensible then different units in coding layer represents something which might be considered as generators of perpendicular features which are explaining the most of the variance in data (exatly like PCAs do). This is why such representation might be pretty useful in further phase of learning.

like image 197
Marcin Możejko Avatar answered Sep 21 '22 01:09

Marcin Możejko