Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras: binary_crossentropy & categorical_crossentropy confusion

After using TensorFlow for quite a while I have read some Keras tutorials and implemented some examples. I have found several tutorials for convolutional autoencoders that use keras.losses.binary_crossentropy as the loss function.

I thought binary_crossentropy should not be a multi-class loss function and would most likely use binary labels, but in fact Keras (TF Python backend) calls tf.nn.sigmoid_cross_entropy_with_logits, which actually is intended for classification tasks with multiple, independent classes that are not mutually exclusive.

On the other hand, my expectation for categorical_crossentropy was to be intended for multi-class classifications where target classes have a dependency on each other, but are not necessarily one-hot encoded.

However, the Keras documentation states:

(...) when using the categorical_crossentropy loss, your targets should be in categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros expect for a 1 at the index corresponding to the class of the sample).

If I am not mistaken, this is just the special case of one-hot encoded classification tasks, but the underlying cross-entropy loss also works with probability distributions ("multi-class", dependent labels)?

Additionally, Keras uses tf.nn.softmax_cross_entropy_with_logits (TF python backend) for the implementation, which itself states:

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

Please correct me if I am wrong, but it looks to me that the Keras documentation is - at least - not very "detailed"?!

So, what is the idea behind Keras' naming of the loss functions? Is the documentation correct? If the binary cross entropy would really rely on binary labels, it should not work for autoencoders, right?! Likewise the categorical crossentropy: should only work for one-hot encoded labels if the documentation is correct?!

like image 209
daniel451 Avatar asked Dec 18 '17 22:12

daniel451


Video Answer


1 Answers

Not sure if this answers your question, but for softmax loss the output layer needs to be a probability distribution (i.e. sum to 1), for binary crossentropy loss it doesn't. Simple as that. (Binary doesn't mean that there are only 2 output classes, it just means that each output is binary.)

like image 200
maxymoo Avatar answered Sep 28 '22 06:09

maxymoo