I am using a variational autoencoder to reconstruct images in tensorflow 2.0 with the Keras API. My model's architecture looks like that: 
The lambda layer uses a function to sample from a normal distribution which looks like that:
def sampling(args):
z_mean, z_log_var = args
epsilon = K.random_normal(shape =(1,1,16))
return z_mean + K.exp(0.5 * z_log_var) * epsilon
My hyperparameters are as follows:
epochs = 50
batch size =16
num_training = 1800
num_val = 100
num_test = 100
learning rate = 0.001
exponential decay = 0.9 * initial learning rate (calculated every 5 epochs)
optimizer = Adam
shuffle = True
I am using the following loss:
def vae_loss(y_pred, y_gt):
mse_loss = mse(y_pred, y_gt)
z_mean = model.get_layer('z_mean_layer').output
z_log_var = model.get_layer('z_log_var_layer').output
kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
kl_loss = K.sum(kl_loss, axis=-1)
kl_loss *= -0.5
return K.mean(mse_loss + kl_loss)
My weights are initialized the default way: kernel_initializer='glorot_uniform', bias_initializer='zeros'.
My datasets images consist of a randomly placed circle, which looks like that:

The background has the value 0 and the circle's value is sampled from a uniform distribution between -1 and 1, e.g. 0.987 for all circle pixels.
When I train with this configuration, I get the following loss.

The KL divergence is of magnitude 1e-8, whereas the MSE loss is stays at 0.101.
And I always get the same reconstruction, regardless of the input, which is an image with a constant pixel intensity

Now, if I multiply all input images with 500 (eg. background stays zero, circle pixel values are uniformly distributed in the range (-500, 500)), the network miraculously starts to learn.

with a KL loss of magnitude 50 and MSE loss of magnitude 250 (last epochs)
And the image reconstruction works well. Basically, the MSE metric is high, but the circle contour is positioned in the right place.

My quiestion is: How come the network cannot reconstruct images in the range (-1,1) , but does so in the range (-500, 500)?
Machine precision is set to float32.
I have used numerous learning rates, e.g. 0.00001, but this does not solve the problem. I have also trained for many epochs, e.g. 200, still no result.
As mentioned in the comments there is probably a problem with the scaling of the loss. Your current implementation of the MSE loss uses the mean of the squared differences (which is fairly small). Instead of using the mean, try using the sum of the squared differences over your image. The Keras VAE (https://keras.io/examples/variational_autoencoder/) does this by scaling the computed MSE loss with the original image size (in pytorch this can be specified directly https://github.com/pytorch/examples/blob/234bcff4a2d8480f156799e6b9baae06f7ddc96a/vae/main.py#L74).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With