I'm trying to use the implementation of Variational Autoencoder that I found among the Keras examples (https://github.com/keras-team/keras/blob/master/examples/variational_autoencoder.py).
I just refactored the code in order to use it more easily from a Jupyter notebook (my code: https://github.com/matbell/Autoencoders/blob/master/models/vae.py).
However, when I try to fit the model on my data I get the following output:
Autoencoders/models/vae.py:69: UserWarning: Output "dense_5" missing from loss dictionary. We assume this was done on purpose, and we will not be expecting any data to be passed to "dense_5" during training.
self.vae.compile(optimizer='rmsprop')
Train on 15474 samples, validate on 3869 samples
Epoch 1/50
15474/15474 [==============================] - 1s 76us/step - loss: nan - val_loss: nan
Epoch 2/50
15474/15474 [==============================] - 1s 65us/step - loss: nan - val_loss: nan
Epoch 3/50
15474/15474 [==============================] - 1s 69us/step - loss: nan - val_loss: nan
Epoch 4/50
15474/15474 [==============================] - 1s 62us/step - loss: nan - val_loss: nan
and the loss remains the same for all the training epochs.
I'm not so expert in Deep Learning and Neural Networks fields, so maybe I'm missing something....
This is the input data, where data
and labels
are two pandas.DataFrame
.
In: data.shape
Out: (19343, 87)
In: label.shape
Out: (19343, 1)
And this is how I use the Vae
class (from my code) in Jupyter notebook:
INPUT_SIZE = len(data.columns)
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size = 0.2)
vae = Vae(INPUT_SIZE, intermediate_dim=32)
vae.fit(X_train, X_test)
Thanks for any help!
You might want to initialize your log_var dense layer to zeros. I was having problems with it myself (slightly different code, but effectively doing the same), and it turns out that, however small the variation weights were initialized to, they would explode in just a few rounds of SGD.
The random correlations between epsilon ~N(0,1) and the reconstruction error will be enough to gently bring the weights to nonzero.
Edit - also, the exponential wrapping the variation really helps exploding the gradients. Setting the initial value of the weights to zero gives an initial variation of one, because of the exponential. Initializing it to a low negative value, while giving off an initial close-to-zero variation, makes the gradient enormous on the very first runs. Zero gives me the best results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With