I have the following simple neural network (with 1 neuron only) to test the computation precision of <code>sigmoid</code> activation & <code>binary_crossentropy</code> of Keras: <pre class="prettyprint"><code>model = Sequential() model.add(Dense(1, input_dim=1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) </code></pre> To simplify the test, I manually set the only weight to 1 and bias to 0, and then evaluate the model with 2-point training set <code>{(-a, 0), (a, 1)}</code>, i.e. <pre class="prettyprint"><code>y = numpy.array([0, 1]) for a in range(40): x = numpy.array([-a, a]) keras_ce[a] = model.evaluate(x, y)[0] # cross-entropy computed by keras/tensorflow my_ce[a] = np.log(1+exp(-a)) # My own computation </code></pre> My Question: I found the binary crossentropy (<code>keras_ce</code>) computed by Keras/Tensorflow reach a floor of <code>1.09e-7</code> when <code>a</code> is approx. 16, as illustrated below (blue line). It doesn't decrease further as 'a' keeps growing. Why is that? <img src="https://i.stack.imgur.com/LI3A6.png" alt="enter image description here"> This neural network has 1 neuron only whose weight is set to 1 and bias is 0. With the 2-point training set <code>{(-a, 0), (a, 1)}</code>, the <code>binary_crossentropy</code> is just -1/2 [ log(1 - 1/(1+exp(a)) ) + log( 1/(1+exp(-a)) ) ] = log(1+exp(-a)) So the cross-entropy should decrease as <code>a</code> increases, as illustrated in orange ('my') above. Is there some Keras/Tensorflow/Python setup I can change to increase its precision? Or am I mistaken somewhere? I'd appreciate any suggestions/comments/answers.

TL;DR version: the probability values (i.e. the outputs of sigmoid function) are clipped due to numerical stability when computing the loss function. <hr> If you inspect the source code, you would find that using <code>binary_crossentropy</code> as the loss would result in a call to <code>binary_crossentropy</code> function in losses.py file: <pre class="prettyprint"><code>def binary_crossentropy(y_true, y_pred): return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1) </code></pre> which in turn, as you can see, calls the equivalent backend function. In case of using Tensorflow as the backend, that would result in a call to <code>binary_crossentropy</code> function in tensorflow_backend.py file: <pre class="prettyprint"><code>def binary_crossentropy(target, output, from_logits=False): """ Docstring ...""" # Note: tf.nn.sigmoid_cross_entropy_with_logits # expects logits, Keras expects probabilities. if not from_logits: # transform back to logits _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype) output = tf.clip_by_value(output, _epsilon, 1 - _epsilon) output = tf.log(output / (1 - output)) return tf.nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output) </code></pre> As you can see <code>from_logits</code> argument is set to <code>False</code> by default. Therefore, the if condition evaluates to true and as a result the values in the output are clipped to the range <code>[epsilon, 1-epislon]</code>. That's why no matter how small or large a probability is, it could not be smaller than <code>epsilon</code> and greater than <code>1-epsilon</code>. And that explains why the output of <code>binary_crossentropy</code> loss is also bounded. Now, what is this epsilon here? It is a very small constant which is used for numerical stability (e.g. prevent division by zero or undefined behaviors, etc.). To find out its value you can further inspect the source code and you would find it in the common.py file: <pre class="prettyprint"><code>_EPSILON = 1e-7 def epsilon(): """Returns the value of the fuzz factor used in numeric expressions. # Returns A float. # Example ```python >>> keras.backend.epsilon() 1e-07 ``` """ return _EPSILON </code></pre> If for any reason, you would like more precision you can alternatively set the epsilon value to a smaller constant using <code>set_epsilon</code> function from the backend: <pre class="prettyprint"><code>def set_epsilon(e): """Sets the value of the fuzz factor used in numeric expressions. # Arguments e: float. New value of epsilon. # Example ```python >>> from keras import backend as K >>> K.epsilon() 1e-07 >>> K.set_epsilon(1e-05) >>> K.epsilon() 1e-05 ``` """ global _EPSILON _EPSILON = e </code></pre> However, be aware that setting epsilon to an extremely low positive value or zero, may disrupt the stability of computations all over the Keras.

Why does sigmoid & crossentropy of Keras/tensorflow have low precision?

Tags:

python

tensorflow

classification

keras

cross-entropy

I have the following simple neural network (with 1 neuron only) to test the computation precision of sigmoid activation & binary_crossentropy of Keras:

model = Sequential()
model.add(Dense(1, input_dim=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

To simplify the test, I manually set the only weight to 1 and bias to 0, and then evaluate the model with 2-point training set {(-a, 0), (a, 1)}, i.e.

y = numpy.array([0, 1])
for a in range(40):
    x = numpy.array([-a, a])
    keras_ce[a] = model.evaluate(x, y)[0] # cross-entropy computed by keras/tensorflow
    my_ce[a] = np.log(1+exp(-a)) # My own computation

My Question: I found the binary crossentropy (keras_ce) computed by Keras/Tensorflow reach a floor of 1.09e-7 when a is approx. 16, as illustrated below (blue line). It doesn't decrease further as 'a' keeps growing. Why is that?

enter image description here

This neural network has 1 neuron only whose weight is set to 1 and bias is 0. With the 2-point training set {(-a, 0), (a, 1)}, the binary_crossentropy is just

-1/2 [ log(1 - 1/(1+exp(a)) ) + log( 1/(1+exp(-a)) ) ] = log(1+exp(-a))

So the cross-entropy should decrease as a increases, as illustrated in orange ('my') above. Is there some Keras/Tensorflow/Python setup I can change to increase its precision? Or am I mistaken somewhere? I'd appreciate any suggestions/comments/answers.

255

asked Sep 01 '18 07:09

syeh_106

2 Answers

TL;DR version: the probability values (i.e. the outputs of sigmoid function) are clipped due to numerical stability when computing the loss function.

If you inspect the source code, you would find that using binary_crossentropy as the loss would result in a call to binary_crossentropy function in losses.py file:

def binary_crossentropy(y_true, y_pred):
    return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)

which in turn, as you can see, calls the equivalent backend function. In case of using Tensorflow as the backend, that would result in a call to binary_crossentropy function in tensorflow_backend.py file:

def binary_crossentropy(target, output, from_logits=False):
    """ Docstring ..."""

    # Note: tf.nn.sigmoid_cross_entropy_with_logits
    # expects logits, Keras expects probabilities.
    if not from_logits:
        # transform back to logits
        _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
        output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
        output = tf.log(output / (1 - output))

    return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
                                                   logits=output)

As you can see from_logits argument is set to False by default. Therefore, the if condition evaluates to true and as a result the values in the output are clipped to the range [epsilon, 1-epislon]. That's why no matter how small or large a probability is, it could not be smaller than epsilon and greater than 1-epsilon. And that explains why the output of binary_crossentropy loss is also bounded.

Now, what is this epsilon here? It is a very small constant which is used for numerical stability (e.g. prevent division by zero or undefined behaviors, etc.). To find out its value you can further inspect the source code and you would find it in the common.py file:

_EPSILON = 1e-7

def epsilon():
    """Returns the value of the fuzz factor used in numeric expressions.
    # Returns
        A float.
    # Example
    ```python
        >>> keras.backend.epsilon()
        1e-07
    ```
    """
    return _EPSILON

If for any reason, you would like more precision you can alternatively set the epsilon value to a smaller constant using set_epsilon function from the backend:

def set_epsilon(e):
    """Sets the value of the fuzz factor used in numeric expressions.
    # Arguments
        e: float. New value of epsilon.
    # Example
    ```python
        >>> from keras import backend as K
        >>> K.epsilon()
        1e-07
        >>> K.set_epsilon(1e-05)
        >>> K.epsilon()
        1e-05
    ```
    """
    global _EPSILON
    _EPSILON = e

However, be aware that setting epsilon to an extremely low positive value or zero, may disrupt the stability of computations all over the Keras.

answered Sep 19 '22 17:09

today

I think that keras take into account numerical stability, Let's track how keras caculate

First,

def binary_crossentropy(y_true, y_pred):
    return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)

Then,

def binary_crossentropy(target, output, from_logits=False):
    """Binary crossentropy between an output tensor and a target tensor.

    # Arguments
        target: A tensor with the same shape as `output`.
        output: A tensor.
        from_logits: Whether `output` is expected to be a logits tensor.
            By default, we consider that `output`
            encodes a probability distribution.

    # Returns
        A tensor.
    """
    # Note: tf.nn.sigmoid_cross_entropy_with_logits
    # expects logits, Keras expects probabilities.
    if not from_logits:
        # transform back to logits
        _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
        output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
        output = tf.log(output / (1 - output))


    return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
                                                   logits=output)

Notice tf.clip_by_value is used for numerical stability

Let's compare keras binary_crossentropy, tensorflow tf.nn.sigmoid_cross_entropy_with_logits and custom loss function(eleminate vale clipping)

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense
import keras

# keras
model = Sequential()
model.add(Dense(units=1, activation='sigmoid', input_shape=(
    1,), weights=[np.ones((1, 1)), np.zeros(1)]))
# print(model.get_weights())
model.compile(loss='binary_crossentropy',
              optimizer='adam', metrics=['accuracy'])

# tensorflow
G = tf.Graph()
with G.as_default():
    x_holder = tf.placeholder(dtype=tf.float32, shape=(2,))
    y_holder = tf.placeholder(dtype=tf.float32, shape=(2,))
    entropy = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
        logits=x_holder, labels=y_holder))
sess = tf.Session(graph=G)


# keras with custom loss function
def customLoss(target, output):
    # if not from_logits:
    #     # transform back to logits
    #     _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
    #     output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
    #     output = tf.log(output / (1 - output))
    output = tf.log(output / (1 - output))
    return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
                                                   logits=output)
model_m = Sequential()
model_m.add(Dense(units=1, activation='sigmoid', input_shape=(
    1,), weights=[np.ones((1, 1)), np.zeros(1)]))
# print(model.get_weights())
model_m.compile(loss=customLoss,
                optimizer='adam', metrics=['accuracy'])


N = 100
xaxis = np.linspace(10, 20, N)
keras_ce = np.zeros(N)
tf_ce = np.zeros(N)
my_ce = np.zeros(N)
keras_custom = np.zeros(N)

y = np.array([0, 1])
for i, a in enumerate(xaxis):
    x = np.array([-a, a])
    # cross-entropy computed by keras/tensorflow
    keras_ce[i] = model.evaluate(x, y)[0]
    my_ce[i] = np.log(1+np.exp(-a))  # My own computation
    tf_ce[i] = sess.run(entropy, feed_dict={x_holder: x, y_holder: y})
    keras_custom[i] = model_m.evaluate(x, y)[0]
# print(model.get_weights())

plt.plot(xaxis, keras_ce, label='keras')
plt.plot(xaxis, my_ce, 'b',  label='my_ce')
plt.plot(xaxis, tf_ce, 'r:', linewidth=5, label='tensorflow')
plt.plot(xaxis, keras_custom, '--', label='custom loss')
plt.xlabel('a')
plt.ylabel('xentropy')
plt.yscale('log')
plt.legend()
plt.savefig('compare.jpg')
plt.show()

we can see that tensorflow is same with manual computing, but keras with custom loss encounter numeric overflow as expected. enter image description here

answered Sep 19 '22 17:09

BugKiller

Related questions
                            
                                Why is the dictionary key being converted to an inherited class type?
                            
                                Why does python behave this way with variables?
                            
                                IllegalArgumentException thrown when count and collect function in spark
                            
                                Plot datetime.timedelta using matplotlib and python
                            
                                Efficient numpy argsort with condition while maintaining original indices
                            
                                multiplying lists of lists with different lengths
                            
                                Perform operation on all "key":"value" pair in dict and store the result in a new dict object
                            
                                Get model name from instance
                            
                                TclError: no display name and no $DISPLAY environment variable in Google Colab
                            
                                What does the 'tearoff' attribute do in a tkinter Menu?
                            
                                Test if any column of a pandas DataFrame satisfies a condition
                            
                                row sum on a pandas pivot table
                            
                                Create a circular barplot in python
                            
                                Pandas: reading Excel file starting from the row below that with a specific value
                            
                                No module named graphframes Jupyter Notebook
                            
                                Check if dataframe has a zero element
                            
                                Fatal Python error: Py_Initialize: can't initialize sys standard streams LookupError: unknown encoding: 65001
                            
                                self.model() in django custom UserManager
                            
                                Fill the diagonal of Pandas DataFrame with elements from Pandas Series
                            
                                np.where() do nothing if condition fails

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With