Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a built-in KL divergence loss function in TensorFlow?

I have two tensors, prob_a and prob_b with shape [None, 1000], and I want to compute the KL divergence from prob_a to prob_b. Is there a built-in function for this in TensorFlow? I tried using tf.contrib.distributions.kl(prob_a, prob_b), but it gives:

NotImplementedError: No KL(dist_a || dist_b) registered for dist_a type Tensor and dist_b type Tensor

If there is no built-in function, what would be a good workaround?

like image 854
Transcendental Avatar asked Jan 25 '17 23:01

Transcendental


People also ask

How to compute KL_divergence in TensorFlow?

KL Divergence can be defined as: In tensorflow, we can use tf.distributions.kl_divergence () to compute it. However, it may report nan or inf error. Here is an explaination. In order to avoid Nan or INF error, we can fix the tensor value before starting to compute. Here is an example:

How do you find the KL divergence of a graph?

KL divergence between P and Q can be computed as: Here P is “true” distribution. − p ( x) l o g q ( x) is the cross entropy between P ( x) and Q ( x), which means we can compute kl divergence loss using cross entropy loss.

What are the different types of loss functions in TensorFlow?

Ultimate Guide To Loss functions In Tensorflow Keras API With Python Implementation 1. Binary Cross-Entropy (BCE) loss. BCE is used to compute the cross-entropy between the true labels and predicted... 2. Categorical Crossentropy loss. The categorical cross-entropy loss function is used to compute ...

How to avoid NaN or INF error in KL divergence?

In order to avoid Nan or INF error, we can fix the tensor value before starting to compute. Here is an example: However, in this tutorial, we will introduce another method to compute kl divergence, you can use it as a loss to train your model.


1 Answers

Assuming that your input tensors prob_a and prob_b are probability tensors that sum to 1 along the last axis, you could do it like this:

def kl(x, y):
    X = tf.distributions.Categorical(probs=x)
    Y = tf.distributions.Categorical(probs=y)
    return tf.distributions.kl_divergence(X, Y)

result = kl(prob_a, prob_b)

A simple example:

import numpy as np
import tensorflow as tf
a = np.array([[0.25, 0.1, 0.65], [0.8, 0.15, 0.05]])
b = np.array([[0.7, 0.2, 0.1], [0.15, 0.8, 0.05]])
sess = tf.Session()
print(kl(a, b).eval(session=sess))  # [0.88995184 1.08808468]

You would get the same result with

np.sum(a * np.log(a / b), axis=1) 

However, this implementation is a bit buggy (checked in Tensorflow 1.8.0).

If you have zero probabilities in a, e.g. if you try [0.8, 0.2, 0.0] instead of [0.8, 0.15, 0.05], you will get nan even though by Kullback-Leibler definition 0 * log(0 / b) should contribute as zero.

To mitigate this, one should add some small numerical constant. It is also prudent to use tf.distributions.kl_divergence(X, Y, allow_nan_stats=False) to cause a runtime error in such situations.

Also, if there are some zeros in b, you will get inf values which won't be caught by the allow_nan_stats=False option so those have to be handled as well.

like image 182
meferne Avatar answered Sep 18 '22 13:09

meferne