Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using binary_crossentropy loss in Keras (Tensorflow backend)

In the training example in Keras documentation,

https://keras.io/getting-started/sequential-model-guide/#training

binary_crossentropy is used and sigmoid activation is added in the network's last layer, but is it necessary that add sigmoid in the last layer? As I found in the source code:

def binary_crossentropy(output, target, from_logits=False):
  """Binary crossentropy between an output tensor and a target tensor.
  Arguments:
      output: A tensor.
      target: A tensor with the same shape as `output`.
      from_logits: Whether `output` is expected to be a logits tensor.
          By default, we consider that `output`
          encodes a probability distribution.
  Returns:
      A tensor.
  """
  # Note: nn.softmax_cross_entropy_with_logits
  # expects logits, Keras expects probabilities.
  if not from_logits:
    # transform back to logits
    epsilon = _to_tensor(_EPSILON, output.dtype.base_dtype)
    output = clip_ops.clip_by_value(output, epsilon, 1 - epsilon)
    output = math_ops.log(output / (1 - output))
  return nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)

Keras invokes sigmoid_cross_entropy_with_logits in Tensorflow, but in sigmoid_cross_entropy_with_logits function, sigmoid(logits) is calculated again.

https://www.tensorflow.org/versions/master/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits

So I don't think it makes sense that add a sigmoid at last, but seemingly all the binary/multi-label classification examples and tutorials in Keras I found online added sigmoid at last. Besides I don't understand what is the meaning of

# Note: nn.softmax_cross_entropy_with_logits
# expects logits, Keras expects probabilities.

Why Keras expects probabilities? Doesn't it use the nn.softmax_cross_entropy_with_logits function? Does it make sense?

Thanks.

like image 294
Ming Avatar asked Aug 17 '17 17:08

Ming


2 Answers

You're right, that's exactly what's happening. I believe this is due to historical reasons.

Keras was created before tensorflow, as a wrapper around theano. And in theano, one has to compute sigmoid/softmax manually and then apply cross-entropy loss function. Tensorflow does everything in one fused op, but the API with sigmoid/softmax layer was already adopted by the community.

If you want to avoid unnecessary logit <-> probability conversions, call binary_crossentropy loss withfrom_logits=True and don't add the sigmoid layer.

like image 69
Maxim Avatar answered Sep 19 '22 02:09

Maxim


In categorical cross entropy :

  • if it is prediction it will compute the cross entropy directly
  • if it is logit it will apply softmax_cross entropy with logit

In Binary cross entropy:

  • if it is prediction it will convert it back to logit then apply sigmoied cross entropy with logit
  • if it is logit it will apply sigmoied cross entropy with logitdirectly
like image 45
W. Sam Avatar answered Sep 19 '22 02:09

W. Sam