As mentioned here, cross entropy is not a proper loss function for multi-label classification. My question is "is this fact true for cross entropy with softmax too?". If it is, how it can be matched with this part of the document.
I should mention that the scope of my question is in cntk.
Softmax is not suited for multi-label classification. It probably is the reason of your results. Should I use a categorical cross-entropy or binary cross-entropy loss for binary predictions?
The objective is to calculate for cross-entropy loss given these information. Softmax is continuously differentiable function. This makes it possible to calculate the derivative of the loss function with respect to every weight in the neural network.
Binary cross-entropy is for multi-label classifications, whereas categorical cross entropy is for multi-class classification where each example belongs to a single class.
The most popular loss functions for deep learning classification models are binary cross-entropy and sparse categorical cross-entropy. Binary cross-entropy is useful for binary and multilabel classification problems.
Multilabel classification typically means "many binary labels". With that definition in mind, cross entropy with softmax is not appropriate for multilabel classification. The document in the second link you provide talks about multiclass problems not multilabel problems. Cross entropy with softmax is appropriate for multiclass classification. For multilabel classification a common choice is to use the sum of binary cross entropies of each labels. The binary cross entropy can be computed with Logistic
in Brainscript or with binary_cross_entropy
in Python.
If on the other hand you are a problem with many multiclass labels, then you can use cross_entropy_with_softmax for each of them and CNTK will automatically sum all these loss values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With