In order to make the case simple and intuitive, I will using binary (0 and 1) classification for illustration. Loss function <pre class="prettyprint"><code>loss = np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY)) #cross entropy cost = -np.sum(loss)/m #num of examples in batch is m </code></pre> Probability of Y <code>predY</code> is computed using sigmoid and <code>logits</code> can be thought as the outcome of from a neural network before reaching the classification step <pre class="prettyprint"><code>predY = sigmoid(logits) #binary case def sigmoid(X): return 1/(1 + np.exp(-X)) </code></pre> Problem Suppose we are running a feed-forward net. <blockquote> Inputs: [3, 5]: 3 is number of examples and 5 is feature size (fabricated data) Num of hidden units: 100 (only 1 hidden layer) Iterations: 10000 </blockquote> Such arrangement is set to overfit. When it's overfitting, we can perfectly predict the probability for the training examples; in other words, sigmoid outputs either 1 or 0, exact number because the exponential gets exploded. If this is the case, we would have <code>np.log(0)</code> undefined. How do you usually handle this issue?

<blockquote> How do you usually handle this issue? </blockquote> Add small number (something like 1e-15) to <code>predY</code> - this number doesn't make predictions much off, and it solves log(0) issue. BTW if your algorithm outputs zeros and ones it might be useful to check the histogram of returned probabilities - when algorithm is so sure that something's happening it can be a sign of overfitting.

How to handle log(0) when using cross entropy

Q: Can binary cross-entropy be zero?

Binary cross entropy compares each of the predicted probabilities to actual class output which can be either 0 or 1.

Tags:

machine-learning

numpy

deep-learning

In order to make the case simple and intuitive, I will using binary (0 and 1) classification for illustration.

Loss function

loss = np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY)) #cross entropy
cost = -np.sum(loss)/m #num of examples in batch is m

Probability of Y

predY is computed using sigmoid and logits can be thought as the outcome of from a neural network before reaching the classification step

predY = sigmoid(logits) #binary case

def sigmoid(X):
    return 1/(1 + np.exp(-X))

Problem

Suppose we are running a feed-forward net.

Inputs: [3, 5]: 3 is number of examples and 5 is feature size (fabricated data)

Num of hidden units: 100 (only 1 hidden layer)

Iterations: 10000

Such arrangement is set to overfit. When it's overfitting, we can perfectly predict the probability for the training examples; in other words, sigmoid outputs either 1 or 0, exact number because the exponential gets exploded. If this is the case, we would have np.log(0) undefined. How do you usually handle this issue?

694

asked Apr 25 '18 09:04

GabrielChu

1 Answers

How do you usually handle this issue?

Add small number (something like 1e-15) to predY - this number doesn't make predictions much off, and it solves log(0) issue.

BTW if your algorithm outputs zeros and ones it might be useful to check the histogram of returned probabilities - when algorithm is so sure that something's happening it can be a sign of overfitting.

139

answered Sep 17 '22 17:09

Jakub Bartczuk

Related questions
                            
                                Numpy get index of row with second-largest value
                            
                                Pandas Dataframe - Droping Certain Hours of the Day from 20 Years of Historical Data
                            
                                Pairwise Distances Between Two "islands"/"connected components" in Numpy Array
                            
                                What's wrong with my PCA?
                            
                                Matlab / Octave bwdist() in Python or C
                            
                                Iteration through all 1 dimensional subarrays of a multi-dimensional array
                            
                                Finding unique points in numpy array
                            
                                Alternative to Scipy mode function in Numpy?
                            
                                How do I split an ndarray based on array of indexes?
                            
                                Numpy cross-product on rectangular grid
                            
                                Plot Mandelbrot with matplotlib / pyplot / numpy / python
                            
                                Why won't Perceptron Learning Algorithm converge?
                            
                                2D Numpy array to HTML table?
                            
                                Find Arc/Circle equation given three points in space (3D)
                            
                                How to declare an ndarray in cython with a general floating point type
                            
                                Differences between X.ravel() and X.reshape(s0*s1*s2) when number of axes known
                            
                                How to generate noise in frequency range with numpy?
                            
                                Numpy is calculating wrong [duplicate]
                            
                                'numpy.ndarray' object has no attribute 'imshow'
                            
                                Easy parallelization of numpy.apply_along_axis()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With