Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confusion between Binary_crossentropy and Categorical_crossentropy

I am doing binary class classification using deep neural network. Whenever I am using binary_crossentropy my model is not giving good accuracy (it is closer to the random prediction). But if I use categorical crossentropy by making the size of the output layer 2, I am getting good accuracy in only 1 epoch which is close to the 0.90. Can anyone please explain what is happening here?

like image 832
Avijit Dasgupta Avatar asked May 25 '16 04:05

Avijit Dasgupta


People also ask

What is the difference between Sparse_categorical_crossentropy and Categorical_crossentropy?

Simply: categorical_crossentropy ( cce ) produces a one-hot array containing the probable match for each category, sparse_categorical_crossentropy ( scce ) produces a category index of the most likely matching category.

What is the difference between categorical cross-entropy and sparse categorical cross-entropy loss functions?

The only difference between sparse categorical cross entropy and categorical cross entropy is the format of true labels. When we have a single-label, multi-class classification problem, the labels are mutually exclusive for each data, meaning each data entry can only belong to one class.

What is the purpose of Binary_crossentropy?

binary_crossentropy: Used as a loss function for binary classification model. The binary_crossentropy function computes the cross-entropy loss between true labels and predicted labels. categorical_crossentropy: Used as a loss function for multi-class classification model where there are two or more output labels.

Can I use binary cross-entropy for multiclass classification?

It follows that Binary CE can be used for multiclass classification in case an observation can belong to multiple classes at the same time. In that case, belonging to one class doesn't inform the model on belonging to a different class and it's like if any node is an independent output. Great.


1 Answers

I also have this problem while trying to use binary_crossentropy with softmax activation in the output layer. As far as I know, softmax give the probability of each class, so if your output layer has 2 nodes, it will be something like p(x1), p(x2) and x1 + x2 = X. Therefore, if you have only 1 output node, it will always be equals to 1.0 (100%), that's why you have close to random prediction (honestly, it will be close to your category distribution in the evaluation set).

Try changing it to another activation method like sigmoid or relu.

like image 76
Nova Avatar answered Sep 20 '22 11:09

Nova