How can I compute kl diveregence in keras while using tensorflow as backend? I compute L1 loss as follows:
def l1_loss(y_true, y_pred):
    return K.sum(K.abs(y_pred - y_true), axis=-1)
KL divergence can be calculated as the negative sum of probability of each event in P multiplied by the log of the probability of the event in Q over the probability of the event in P. The value within the sum is the divergence for a given event.
KL divergence is the relative entropy or difference between cross entropy and entropy or some distance between actual probability distribution and predicted probability distribution. It is equal to 0 when the predicted probability distribution is the same as the actual probability distribution.
Creating custom loss functions in Keras A custom loss function can be created by defining a function that takes the true values and predicted values as required parameters. The function should return an array of losses. The function can then be passed at the compile stage.
So, KL divergence in simple term is a measure of how two probability distributions (say 'p' and 'q') are different from each other. So this is exactly what we care about while calculating the loss function.
Keras already has the KL-divergence implemented, as it can be seen here, the code is just:
def kullback_leibler_divergence(y_true, y_pred):
    y_true = K.clip(y_true, K.epsilon(), 1)
    y_pred = K.clip(y_pred, K.epsilon(), 1)
    return K.sum(y_true * K.log(y_true / y_pred), axis=-1)
So just use kld, KLD or kullback_leibler_divergence as loss.
You can simply use the tf.keras.losses.kullback_leibler_divergence function.
If you want to use it as an activity regularizer, you can create a simple regularization function:
import keras # if using keras
# from tensorflow import keras # if using tf.keras
kullback_leibler_divergence = keras.losses.kullback_leibler_divergence
K = keras.backend
def kl_divergence_regularizer(inputs):
    means = K.mean(inputs, axis=0)
    return 0.01 * (kullback_leibler_divergence(0.05, means)
                 + kullback_leibler_divergence(1 - 0.05, 1 - means))
In this example, 0.01 is the regularization weight, and 0.05 is the sparsity target. Then use it like this:
keras.layers.Dense(32, activation="sigmoid",
                   activity_regularizer=kl_divergence_regularizer)
For example, this would be the encoding layer of a sparse autoencoder.
Note that the kullback_leibler_divergence expects all the class probabilities, even in the case of binary classification (giving just the positive class probability is not enough). This is why we compute the KLD for both 0.05 and 1-0.05 in the function above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With