I am working with a Variational Autoencoder Type model and part of my loss function is the KL divergence between a Normal Distribution with mean 0 and variance 1 and another Normal Distribution whose mean and variance are predicted by my model.
I defined the loss in the following way:
def kl_loss(mean, log_sigma):
normal=tf.contrib.distributions.MultivariateNormalDiag(tf.zeros(mean.get_shape()),
tf.ones(log_sigma.get_shape()))
enc_normal = tf.contrib.distributions.MultivariateNormalDiag(mean,
tf.exp(log_sigma),
validate_args=True,
allow_nan_stats=False,
name="encoder_normal")
kl_div = tf.contrib.distributions.kl_divergence(normal,
enc_normal,
allow_nan_stats=False,
name="kl_divergence")
return kl_div
The input are unconstrained vectors of length N with
log_sigma.get_shape() == mean.get_shape()
Now during training I observe a negative KL divergence after a few thousand iterations up to values of -10. Below you can see the Tensorboard training curves:
KL divergence curve
Zoom in of KL divergence curve
Now this seems odd to me as the KL divergence should be positive under certain conditions. I understand that we require "The K-L divergence is only defined if P and Q both sum to 1 and if Q(i) > 0 for any i such that P(i) > 0." (see https://mathoverflow.net/questions/43849/how-to-ensure-the-non-negativity-of-kullback-leibler-divergence-kld-metric-rela) but I don't see how this could be violated in my case. Any help is highly appreciated!
Wikipedia - KL properties says that KL can never be negative.
So, KL divergence in simple term is a measure of how two probability distributions (say 'p' and 'q') are different from each other. So this is exactly what we care about while calculating the loss function.
The KL divergence tells us how well the probability distribution Q approximates the probability distribution P by calculating the cross-entropy minus the entropy. Intuitively, you can think of that as the statistical measure of how one distribution differs from another.
Faced the same problem. It happened because of float precision used. If you notice the negative values occur close to 0 and is bounded to a small negative value. Adding a small positive value to the loss is a work around.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With