Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle log(0) when using cross entropy

In order to make the case simple and intuitive, I will using binary (0 and 1) classification for illustration.

Loss function

loss = np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY)) #cross entropy
cost = -np.sum(loss)/m #num of examples in batch is m

Probability of Y

predY is computed using sigmoid and logits can be thought as the outcome of from a neural network before reaching the classification step

predY = sigmoid(logits) #binary case

def sigmoid(X):
    return 1/(1 + np.exp(-X))

Problem

Suppose we are running a feed-forward net.

Inputs: [3, 5]: 3 is number of examples and 5 is feature size (fabricated data)

Num of hidden units: 100 (only 1 hidden layer)

Iterations: 10000

Such arrangement is set to overfit. When it's overfitting, we can perfectly predict the probability for the training examples; in other words, sigmoid outputs either 1 or 0, exact number because the exponential gets exploded. If this is the case, we would have np.log(0) undefined. How do you usually handle this issue?

like image 694
GabrielChu Avatar asked Apr 25 '18 09:04

GabrielChu


People also ask

What log does cross-entropy use?

It turns out that the formulation of cross-entropy between two probability distributions coincides with the negative log-likelihood. However, as implemented in PyTorch, the CrossEntropyLoss expects raw prediction values while the NLLLoss expects log probabilities.

Is cross-entropy log loss?

Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label.

What is the base of log in cross-entropy loss?

Equation 1: Definition of Entropy. Note log is calculated to base 2. Reason for negative sign: log(p(x))<0 for all p(x) in (0,1) . p(x) is a probability distribution and therefore the values must range between 0 and 1.

Can binary cross-entropy be zero?

Binary cross entropy compares each of the predicted probabilities to actual class output which can be either 0 or 1.


1 Answers

How do you usually handle this issue?

Add small number (something like 1e-15) to predY - this number doesn't make predictions much off, and it solves log(0) issue.

BTW if your algorithm outputs zeros and ones it might be useful to check the histogram of returned probabilities - when algorithm is so sure that something's happening it can be a sign of overfitting.

like image 139
Jakub Bartczuk Avatar answered Sep 17 '22 17:09

Jakub Bartczuk