I am training a binary classifier using Sigmoid activation function with Binary crossentropy which gives good accuracy around 98%. The same when I train using softmax with categorical_crossentropy gives very low accuracy (< 40%). I am passing the targets for binary_crossentropy as list of 0s and 1s eg; [0,1,1,1,0]. Any idea why this is happening? This is the model I am using for the second classifier: <img src="https://i.stack.imgur.com/T4s74.png" alt="enter image description here">

Right now, your second model always answers "Class 0" as it can choose between only one class (number of outputs of your last layer). As you have two classes, you need to compute the softmax + categorical_crossentropy on two outputs to pick the most probable one. Hence, your last layer should be: <pre class="prettyprint"><code>model.add(Dense(2, activation='softmax') model.compile(...) </code></pre> Your sigmoid + binary_crossentropy model, which computes the probability of "Class 0" being True by analyzing just a single output number, is already correct. EDIT: Here is a small explanation about the Sigmoid function Sigmoid can be viewed as a mapping between the real numbers space and a probability space. <img src="https://i.stack.imgur.com/VDOBN.png" alt="Sigmoid Function"> Notice that: <pre class="prettyprint"><code>Sigmoid(-infinity) = 0 Sigmoid(0) = 0.5 Sigmoid(+infinity) = 1 </code></pre> So if the real number, output of your network, is very low, the sigmoid will decide the probability of "Class 0" is close to 0, and decide "Class 1" On the contrary, if the output of your network is very high, the sigmoid will decide the probability of "Class 0" is close to 1, and decide "Class 0" Its decision is similar to deciding the Class only by looking at the sign of your output. However, this would not allow your model to learn! Indeed, the gradient of this binary loss is null nearly everywhere, making impossible for your model to learn from error, as it is not quantified properly. That's why sigmoid and "binary_crossentropy" are used: They are a surrogate to the binary loss, which has nice smooth properties, and enables learning. Also, please find more info about Softmax Function and Cross Entropy

Binary classification with Softmax

Q: Can I use softmax for binary classification?

For binary classification, it should give the same results, because softmax is a generalization of sigmoid for a larger number of classes. Show activity on this post. The answer is not always a yes. You can always formulate the binary classification problem in such a way that both sigmoid and softmax will work.

Q: Why softmax is better than sigmoid for binary classification?

When using softmax, increasing the probability of one class decreases the total probability of all other classes (because of sum-to-1). Using sigmoid, increasing the probability of one class does not change the total probability of the other classes.

Q: What is softmax classification?

The Softmax classifier uses the cross-entropy loss. The Softmax classifier gets its name from the softmax function, which is used to squash the raw class scores into normalized positive values that sum to one, so that the cross-entropy loss can be applied.

Q: Can sigmoid be used for binary classification?

Sigmoid is equivalent to a 2-element Softmax, where the second element is assumed to be zero. Therefore, sigmoid is mostly used for binary classification.

Tags:

binary

classification

keras

softmax

sigmoid

I am training a binary classifier using Sigmoid activation function with Binary crossentropy which gives good accuracy around 98%.
The same when I train using softmax with categorical_crossentropy gives very low accuracy (< 40%).
I am passing the targets for binary_crossentropy as list of 0s and 1s eg; [0,1,1,1,0].

Any idea why this is happening?

This is the model I am using for the second classifier: enter image description here

746

asked Aug 21 '17 09:08

AKSHAYAA VAIDYANATHAN

1 Answers

Right now, your second model always answers "Class 0" as it can choose between only one class (number of outputs of your last layer).

As you have two classes, you need to compute the softmax + categorical_crossentropy on two outputs to pick the most probable one.

Hence, your last layer should be:

model.add(Dense(2, activation='softmax')
model.compile(...)

Your sigmoid + binary_crossentropy model, which computes the probability of "Class 0" being True by analyzing just a single output number, is already correct.

EDIT: Here is a small explanation about the Sigmoid function

Sigmoid can be viewed as a mapping between the real numbers space and a probability space.

Sigmoid Function

Notice that:

Sigmoid(-infinity) = 0   
Sigmoid(0) = 0.5   
Sigmoid(+infinity) = 1

So if the real number, output of your network, is very low, the sigmoid will decide the probability of "Class 0" is close to 0, and decide "Class 1"
On the contrary, if the output of your network is very high, the sigmoid will decide the probability of "Class 0" is close to 1, and decide "Class 0"

Its decision is similar to deciding the Class only by looking at the sign of your output. However, this would not allow your model to learn! Indeed, the gradient of this binary loss is null nearly everywhere, making impossible for your model to learn from error, as it is not quantified properly.

That's why sigmoid and "binary_crossentropy" are used:
They are a surrogate to the binary loss, which has nice smooth properties, and enables learning.

Also, please find more info about Softmax Function and Cross Entropy

117

answered Oct 19 '22 20:10

Yohan Grember

Related questions
                            
                                Convert a binary string to Hexadecimal and vice-versa in Elixir
                            
                                Javascript: Equivalent of PHP's hash_hmac() with RAW BINARY output?
                            
                                How to read little endian integers from file in C++?
                            
                                Number of 1s in the two's complement binary representations of integers in a range
                            
                                iTunes Connect and Xcode 8: your app has changed to invalid binary
                            
                                does order of members of objects of a class have any impact on performance?
                            
                                Compute fast log base 2 ceiling in python
                            
                                How do I convert decimal numbers to binary in Perl?
                            
                                How to convert TBytes to Binary File? (using MemoryStream)
                            
                                How to convert an int to a binary string representation in C++
                            
                                How do you convert from ArrayBuffer to byte array in javascript? [duplicate]
                            
                                Cross Compile - tcpdump for x86
                            
                                Insert hex string value to sql server image field is appending extra 0
                            
                                How to embed a file into an executable file?
                            
                                Using Faraday Ruby gem to download image and write to disk
                            
                                Using DEC2BIN() with large numbers
                            
                                How to send binary data with socket.io?
                            
                                How to identify binary and text files using Python? [duplicate]
                            
                                Writing unsigned chars to a binary file using write()
                            
                                Why do computers work in binary?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With