I am trying to understand VAE in-depth by implementing it by myself and having difficulties when back-propagate losses of the decoder input layer to the encoder output layer. <img src="https://gyazo.com/5aef771ac11384f90c683efb9f2223b4.png" alt="VAE"> My encoder network outputs 8 pairs (sigma, mu) which I then combine with the result of a stochastic sampler to produce 8 input values (z) for the decoder network: <pre class="prettyprint"><code>decoder_in = sigma * N(0,I) + mu </code></pre> Then I run forward propagation for the decoder network, compute MSE reconstruction loss and back-propagate weights, and losses up to the decoder input layer. Here I stuck completely since there is no comprehensible explanation of how to back-propagate losses from the decoder input layer to the encoder output layer. My best idea was to store the results of sampling from N(0,I) to (epsilon) and use them in such a way: <pre class="prettyprint"><code>L(sigma) = epsilon * dLz(decoder_in) L(mu) = 1.0 * dLz(decoder_in) </code></pre> It kind of works, but in the long run the sigma components of the encoded vector of distributions tend to regress to zeroes, so my VAE as a result also regressed to AE. Also, I still have no clue how to integrate KL-loss in this scheme. Should I add it to the encoder loss or somehow combine it with the decoder MSE loss?

The VAE does not use the reconstruction error as the cost objective if you use that the model just turns back into an autoencoder. The VAE uses the variational lower bound and a couple of neat tricks to make it easy to compute. Referring to the original “auto-encoding variational bayes” paper The variational lower bound objective is (eq 10): 1/2( d+log(sigmaTsigma) -(muTmu) - (sigmaTsigma)) + log p(x/z) Where d is number of latent variable, mu and sigma is the output of the encoding neural network used to scale the standard normal samples and z is the encoded sample. p(x/z) is just the decoder probability of generating back the input x. All the variables in the above equation are completely differentiable and hence can be optimized with gradient descent or any other gradient based optimizer you find in tensorflow

From what I understand, the solution should look like this: <pre class="prettyprint"><code>L(sigma) = epsilon * dLz(decoder_in) - 0.5 * 2 / sigma + 0.5 * 2 * sigma L(mu) = 1.0 * dLz(decoder_in) + 0.5 * 2 * mu </code></pre>

Back propagation from decoder input to encoder output in variational autoencoder

Tags:

artificial-intelligence

machine-learning

neural-network

deep-learning

I am trying to understand VAE in-depth by implementing it by myself and having difficulties when back-propagate losses of the decoder input layer to the encoder output layer.

VAE

My encoder network outputs 8 pairs (sigma, mu) which I then combine with the result of a stochastic sampler to produce 8 input values (z) for the decoder network:

decoder_in = sigma * N(0,I) + mu

Then I run forward propagation for the decoder network, compute MSE reconstruction loss and back-propagate weights, and losses up to the decoder input layer.

Here I stuck completely since there is no comprehensible explanation of how to back-propagate losses from the decoder input layer to the encoder output layer.

My best idea was to store the results of sampling from N(0,I) to (epsilon) and use them in such a way:

L(sigma) = epsilon * dLz(decoder_in)
L(mu) = 1.0 * dLz(decoder_in)

It kind of works, but in the long run the sigma components of the encoded vector of distributions tend to regress to zeroes, so my VAE as a result also regressed to AE.

Also, I still have no clue how to integrate KL-loss in this scheme. Should I add it to the encoder loss or somehow combine it with the decoder MSE loss?

627

asked Aug 05 '20 05:08

game development germ

2 Answers

The VAE does not use the reconstruction error as the cost objective if you use that the model just turns back into an autoencoder. The VAE uses the variational lower bound and a couple of neat tricks to make it easy to compute.

Referring to the original “auto-encoding variational bayes” paper

The variational lower bound objective is (eq 10):

1/2( d+log(sigmaTsigma) -(muTmu) - (sigmaTsigma)) + log p(x/z)

Where d is number of latent variable, mu and sigma is the output of the encoding neural network used to scale the standard normal samples and z is the encoded sample. p(x/z) is just the decoder probability of generating back the input x.

All the variables in the above equation are completely differentiable and hence can be optimized with gradient descent or any other gradient based optimizer you find in tensorflow

answered Nov 30 '22 08:11

Sarin Chandy

From what I understand, the solution should look like this:

L(sigma) = epsilon * dLz(decoder_in) - 0.5 * 2 / sigma + 0.5 * 2 * sigma
L(mu) = 1.0 * dLz(decoder_in) + 0.5 * 2 * mu

answered Nov 30 '22 08:11

Роман Проценко

Related questions
                            
                                Confused about tensor dimensions and batch sizes in pytorch
                            
                                The name 'DecodeJpeg/contents:0' refers to a Tensor which does not exist
                            
                                Is there no "inverse_transform" method for a scaler like MinMaxScaler in spark?
                            
                                Multilabel image classification: is it necessary to have training data for each combination of labels?
                            
                                How to obtain the convex curve for weights vs loss in a neural network [closed]
                            
                                Keras Lambda layer and variables : "TypeError: can't pickle _thread.lock objects"
                            
                                Online clustering of news articles
                            
                                MemoryError when fitting scikit-learn Decision Tree and Random Forest Classifiers
                            
                                MATLAB fitcSVM weight vector
                            
                                How to avoid NaN in numpy implementation of logistic regression?
                            
                                Sklearn Decision Rules for Specific Class in Decision tree
                            
                                Can I use a machine learning model as the objective function in an optimization problem?
                            
                                Session keyword arguments are not support during eager execution. You passed: {'learning_rate': 1e-05}
                            
                                How to detect anomaly in a time series data(specifically) with trend and seasonality present in it?
                            
                                Number of classes, 4, does not match size of target_names, 6. Try specifying the labels parameter
                            
                                Using FunctionTransformer with sklearn Pipeline and ColumnTransformer - error: invalid type promotion
                            
                                How could I change datatype to float64 so that sklearn can work on dataframe which has data greater than np.float32
                            
                                sklearn.preprocessing.OneHotEncoder: using drop and handle_unknown='ignore'
                            
                                How to perform a constrained optimization over a scaled regression model?
                            
                                Running a regression model over 30 specific set.seed automatically using R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With