Variational autoencoder cannot train with smal input values

Question

I am using a variational autoencoder to reconstruct images in tensorflow 2.0 with the Keras API. My model's architecture looks like that: enter image description here

The lambda layer uses a function to sample from a normal distribution which looks like that:

def sampling(args):
    z_mean, z_log_var = args
    epsilon = K.random_normal(shape =(1,1,16))
    return z_mean + K.exp(0.5 * z_log_var) * epsilon

My hyperparameters are as follows:

epochs = 50
batch size =16
num_training = 1800
num_val = 100
num_test = 100
learning rate = 0.001
exponential decay = 0.9 * initial learning rate (calculated every 5 epochs)
optimizer = Adam
shuffle = True

I am using the following loss:

    def vae_loss(y_pred, y_gt):
        mse_loss = mse(y_pred, y_gt)
        z_mean = model.get_layer('z_mean_layer').output
        z_log_var = model.get_layer('z_log_var_layer').output
        kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
        kl_loss = K.sum(kl_loss, axis=-1)
        kl_loss *= -0.5
        return K.mean(mse_loss + kl_loss)

My weights are initialized the default way: kernel_initializer='glorot_uniform', bias_initializer='zeros'.

My datasets images consist of a randomly placed circle, which looks like that:

enter image description here

The background has the value 0 and the circle's value is sampled from a uniform distribution between -1 and 1, e.g. 0.987 for all circle pixels.

When I train with this configuration, I get the following loss.

enter image description here

The KL divergence is of magnitude 1e-8, whereas the MSE loss is stays at 0.101.

And I always get the same reconstruction, regardless of the input, which is an image with a constant pixel intensity

enter image description here

Now, if I multiply all input images with 500 (eg. background stays zero, circle pixel values are uniformly distributed in the range (-500, 500)), the network miraculously starts to learn.

enter image description here

with a KL loss of magnitude 50 and MSE loss of magnitude 250 (last epochs)

And the image reconstruction works well. Basically, the MSE metric is high, but the circle contour is positioned in the right place.

enter image description here

My quiestion is: How come the network cannot reconstruct images in the range (-1,1) , but does so in the range (-500, 500)?

Machine precision is set to float32.

I have used numerous learning rates, e.g. 0.00001, but this does not solve the problem. I have also trained for many epochs, e.g. 200, still no result.

mibaumgartner · Accepted Answer

As mentioned in the comments there is probably a problem with the scaling of the loss. Your current implementation of the MSE loss uses the mean of the squared differences (which is fairly small). Instead of using the mean, try using the sum of the squared differences over your image. The Keras VAE (https://keras.io/examples/variational_autoencoder/) does this by scaling the computed MSE loss with the original image size (in pytorch this can be specified directly https://github.com/pytorch/examples/blob/234bcff4a2d8480f156799e6b9baae06f7ddc96a/vae/main.py#L74).

Variational autoencoder cannot train with smal input values

Tags:

neural-network

tensorflow

deep-learning

keras

Daka

1 Answers

mibaumgartner

Recent Activity

Donate For Us

Variational autoencoder cannot train with smal input values

Tags:

neural-network

tensorflow

deep-learning

keras

Daka

1 Answers

mibaumgartner

Related questions

Recent Activity

Donate For Us