Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary classification with Softmax

I am training a binary classifier using Sigmoid activation function with Binary crossentropy which gives good accuracy around 98%.
The same when I train using softmax with categorical_crossentropy gives very low accuracy (< 40%).
I am passing the targets for binary_crossentropy as list of 0s and 1s eg; [0,1,1,1,0].

Any idea why this is happening?

This is the model I am using for the second classifier: enter image description here

like image 746
AKSHAYAA VAIDYANATHAN Avatar asked Aug 21 '17 09:08

AKSHAYAA VAIDYANATHAN


People also ask

Can I use softmax for binary classification?

For binary classification, it should give the same results, because softmax is a generalization of sigmoid for a larger number of classes. Show activity on this post. The answer is not always a yes. You can always formulate the binary classification problem in such a way that both sigmoid and softmax will work.

Why softmax is better than sigmoid for binary classification?

When using softmax, increasing the probability of one class decreases the total probability of all other classes (because of sum-to-1). Using sigmoid, increasing the probability of one class does not change the total probability of the other classes.

What is softmax classification?

The Softmax classifier uses the cross-entropy loss. The Softmax classifier gets its name from the softmax function, which is used to squash the raw class scores into normalized positive values that sum to one, so that the cross-entropy loss can be applied.

Can sigmoid be used for binary classification?

Sigmoid is equivalent to a 2-element Softmax, where the second element is assumed to be zero. Therefore, sigmoid is mostly used for binary classification.


1 Answers

Right now, your second model always answers "Class 0" as it can choose between only one class (number of outputs of your last layer).

As you have two classes, you need to compute the softmax + categorical_crossentropy on two outputs to pick the most probable one.

Hence, your last layer should be:

model.add(Dense(2, activation='softmax')
model.compile(...)

Your sigmoid + binary_crossentropy model, which computes the probability of "Class 0" being True by analyzing just a single output number, is already correct.

EDIT: Here is a small explanation about the Sigmoid function

Sigmoid can be viewed as a mapping between the real numbers space and a probability space.

Sigmoid Function

Notice that:

Sigmoid(-infinity) = 0   
Sigmoid(0) = 0.5   
Sigmoid(+infinity) = 1   

So if the real number, output of your network, is very low, the sigmoid will decide the probability of "Class 0" is close to 0, and decide "Class 1"
On the contrary, if the output of your network is very high, the sigmoid will decide the probability of "Class 0" is close to 1, and decide "Class 0"

Its decision is similar to deciding the Class only by looking at the sign of your output. However, this would not allow your model to learn! Indeed, the gradient of this binary loss is null nearly everywhere, making impossible for your model to learn from error, as it is not quantified properly.

That's why sigmoid and "binary_crossentropy" are used:
They are a surrogate to the binary loss, which has nice smooth properties, and enables learning.

Also, please find more info about Softmax Function and Cross Entropy

like image 117
Yohan Grember Avatar answered Oct 19 '22 20:10

Yohan Grember