I have two tensors, prob_a
and prob_b
with shape [None, 1000]
, and I want to compute the KL divergence from prob_a
to prob_b
. Is there a built-in function for this in TensorFlow? I tried using tf.contrib.distributions.kl(prob_a, prob_b)
, but it gives:
NotImplementedError: No KL(dist_a || dist_b) registered for dist_a type Tensor and dist_b type Tensor
If there is no built-in function, what would be a good workaround?
KL Divergence can be defined as: In tensorflow, we can use tf.distributions.kl_divergence () to compute it. However, it may report nan or inf error. Here is an explaination. In order to avoid Nan or INF error, we can fix the tensor value before starting to compute. Here is an example:
KL divergence between P and Q can be computed as: Here P is “true” distribution. − p ( x) l o g q ( x) is the cross entropy between P ( x) and Q ( x), which means we can compute kl divergence loss using cross entropy loss.
Ultimate Guide To Loss functions In Tensorflow Keras API With Python Implementation 1. Binary Cross-Entropy (BCE) loss. BCE is used to compute the cross-entropy between the true labels and predicted... 2. Categorical Crossentropy loss. The categorical cross-entropy loss function is used to compute ...
In order to avoid Nan or INF error, we can fix the tensor value before starting to compute. Here is an example: However, in this tutorial, we will introduce another method to compute kl divergence, you can use it as a loss to train your model.
Assuming that your input tensors prob_a
and prob_b
are probability tensors that sum to 1 along the last axis, you could do it like this:
def kl(x, y):
X = tf.distributions.Categorical(probs=x)
Y = tf.distributions.Categorical(probs=y)
return tf.distributions.kl_divergence(X, Y)
result = kl(prob_a, prob_b)
A simple example:
import numpy as np
import tensorflow as tf
a = np.array([[0.25, 0.1, 0.65], [0.8, 0.15, 0.05]])
b = np.array([[0.7, 0.2, 0.1], [0.15, 0.8, 0.05]])
sess = tf.Session()
print(kl(a, b).eval(session=sess)) # [0.88995184 1.08808468]
You would get the same result with
np.sum(a * np.log(a / b), axis=1)
However, this implementation is a bit buggy (checked in Tensorflow 1.8.0).
If you have zero probabilities in a
, e.g. if you try [0.8, 0.2, 0.0]
instead of [0.8, 0.15, 0.05]
, you will get nan
even though by Kullback-Leibler definition 0 * log(0 / b)
should contribute as zero.
To mitigate this, one should add some small numerical constant. It is also prudent to use tf.distributions.kl_divergence(X, Y, allow_nan_stats=False)
to cause a runtime error in such situations.
Also, if there are some zeros in b
, you will get inf
values which won't be caught by the allow_nan_stats=False
option so those have to be handled as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With