Tensorflow, negative KL Divergence

Tags:

I am working with a Variational Autoencoder Type model and part of my loss function is the KL divergence between a Normal Distribution with mean 0 and variance 1 and another Normal Distribution whose mean and variance are predicted by my model.

I defined the loss in the following way:

Click to copy

def kl_loss(mean, log_sigma):
    normal=tf.contrib.distributions.MultivariateNormalDiag(tf.zeros(mean.get_shape()),
                                                           tf.ones(log_sigma.get_shape()))
    enc_normal = tf.contrib.distributions.MultivariateNormalDiag(mean,
                                                                     tf.exp(log_sigma),
                                                                     validate_args=True,
                                                                     allow_nan_stats=False,
                                                                     name="encoder_normal")
    kl_div = tf.contrib.distributions.kl_divergence(normal,
                                                    enc_normal,
                                                    allow_nan_stats=False,
                                                    name="kl_divergence")
return kl_div

The input are unconstrained vectors of length N with

Click to copy

log_sigma.get_shape() == mean.get_shape()

Now during training I observe a negative KL divergence after a few thousand iterations up to values of -10. Below you can see the Tensorboard training curves:

KL divergence curve

Zoom in of KL divergence curve

Now this seems odd to me as the KL divergence should be positive under certain conditions. I understand that we require "The K-L divergence is only defined if P and Q both sum to 1 and if Q(i) > 0 for any i such that P(i) > 0." (see https://mathoverflow.net/questions/43849/how-to-ensure-the-non-negativity-of-kullback-leibler-divergence-kld-metric-rela) but I don't see how this could be violated in my case. Any help is highly appreciated!

402

asked Mar 02 '18 11:03

Prook

1 Answers

Faced the same problem. It happened because of float precision used. If you notice the negative values occur close to 0 and is bounded to a small negative value. Adding a small positive value to the loss is a work around.

118

answered Nov 15 '22 08:11

aki

Related questions
                            
                                Merging dataframes iteratively with pandas
                            
                                Tensorflow ResourceExhaustedError after first batch
                            
                                Python3 - how to correctly do absolute imports and make Pylint happy
                            
                                How can I create on-demand reports once they become too slow for our DB?
                            
                                invite user by username to telegram channel
                            
                                Wagtail moving sqlite to postgres database
                            
                                How to get uwsgi to exit with return code of any failed sub-process
                            
                                Why does importing numpy add 1 GB of virtual memory on Linux?
                            
                                Python hangs for hours on end of functions after creating huge object
                            
                                Python Gtk3 - Scroll TextView inside of ScrolledWindow by mouse and courser position
                            
                                How to preprocess a text stream on the fly in Python?
                            
                                scrapy crawl a set of links that might contains next pages
                            
                                Algorithm to get minimum movement to avoid square overlap
                            
                                Python returning type error on wrong line
                            
                                Segmenting numpy arrays with as_strided
                            
                                Search and Replace in pandas dataframe for large dataset
                            
                                InvalidArgumentError when loading tfrecord file
                            
                                Calling multiple instances of python scripts in matlab using java.lang.Runtime.getRuntime not working
                            
                                How to train statsmodels.tsa.ARIMA model with multiple series
                            
                                SqlAlchemy non persistent column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Tensorflow, negative KL Divergence

Tags:

python

machine-learning

tensorflow

statistics

distribution

Prook

People also ask

1 Answers

aki

Recent Activity

Donate For Us