Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Variational autoencoder cannot train with smal input values

I am using a variational autoencoder to reconstruct images in tensorflow 2.0 with the Keras API. My model's architecture looks like that: enter image description here

The lambda layer uses a function to sample from a normal distribution which looks like that:

def sampling(args):
    z_mean, z_log_var = args
    epsilon = K.random_normal(shape =(1,1,16))
    return z_mean + K.exp(0.5 * z_log_var) * epsilon

My hyperparameters are as follows:

epochs = 50
batch size =16
num_training = 1800
num_val = 100
num_test = 100
learning rate = 0.001
exponential decay = 0.9 * initial learning rate (calculated every 5 epochs)
optimizer = Adam
shuffle = True

I am using the following loss:

    def vae_loss(y_pred, y_gt):
        mse_loss = mse(y_pred, y_gt)
        z_mean = model.get_layer('z_mean_layer').output
        z_log_var = model.get_layer('z_log_var_layer').output
        kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
        kl_loss = K.sum(kl_loss, axis=-1)
        kl_loss *= -0.5
        return K.mean(mse_loss + kl_loss)

My weights are initialized the default way: kernel_initializer='glorot_uniform', bias_initializer='zeros'.

My datasets images consist of a randomly placed circle, which looks like that:

enter image description here

The background has the value 0 and the circle's value is sampled from a uniform distribution between -1 and 1, e.g. 0.987 for all circle pixels.

When I train with this configuration, I get the following loss.

enter image description here

The KL divergence is of magnitude 1e-8, whereas the MSE loss is stays at 0.101.

And I always get the same reconstruction, regardless of the input, which is an image with a constant pixel intensity

enter image description here

Now, if I multiply all input images with 500 (eg. background stays zero, circle pixel values are uniformly distributed in the range (-500, 500)), the network miraculously starts to learn.

enter image description here

with a KL loss of magnitude 50 and MSE loss of magnitude 250 (last epochs)

And the image reconstruction works well. Basically, the MSE metric is high, but the circle contour is positioned in the right place.

enter image description here

My quiestion is: How come the network cannot reconstruct images in the range (-1,1) , but does so in the range (-500, 500)?

Machine precision is set to float32.

I have used numerous learning rates, e.g. 0.00001, but this does not solve the problem. I have also trained for many epochs, e.g. 200, still no result.

like image 544
Daka Avatar asked Jun 15 '26 19:06

Daka


1 Answers

As mentioned in the comments there is probably a problem with the scaling of the loss. Your current implementation of the MSE loss uses the mean of the squared differences (which is fairly small). Instead of using the mean, try using the sum of the squared differences over your image. The Keras VAE (https://keras.io/examples/variational_autoencoder/) does this by scaling the computed MSE loss with the original image size (in pytorch this can be specified directly https://github.com/pytorch/examples/blob/234bcff4a2d8480f156799e6b9baae06f7ddc96a/vae/main.py#L74).

like image 88
mibaumgartner Avatar answered Jun 19 '26 20:06

mibaumgartner



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!