softmax and sigmoid function for the output layer

Tags:

In the deep learning implementations related to object detection and semantic segmentation, I have seen the output layers using either sigmoid or softmax. I am not very clear when to use which? It seems to me both of them can support these tasks. Are there any guidelines for this choice?

777

asked Dec 31 '16 14:12

user288609

4 Answers

softmax() helps when you want a probability distribution, which sums up to 1. sigmoid is used when you want the output to be ranging from 0 to 1, but need not sum to 1.

In your case, you wish to classify and choose between two alternatives. I would recommend using softmax() as you will get a probability distribution which you can apply cross entropy loss function on.

answered Oct 05 '22 20:10

martianwars

The sigmoid and the softmax function have different purposes. For a detailed explanation of when to use sigmoid vs. softmax in neural network design, you can look at this article: "Classification: Sigmoid vs. Softmax."

Short summary:

If you have a multi-label classification problem where there is more than one "right answer" (the outputs are NOT mutually exclusive) then you can use a sigmoid function on each raw output independently. The sigmoid will allow you to have high probability for all of your classes, some of them, or none of them.

If you instead have a multi-class classification problem where there is only one "right answer" (the outputs are mutually exclusive), then use a softmax function. The softmax will enforce that the sum of the probabilities of your output classes are equal to one, so in order to increase the probability of a particular class, your model must correspondingly decrease the probability of at least one of the other classes.

answered Oct 05 '22 18:10

veritessa

Object detection is object classification used on a sliding window in the image. In classification it is important to find the correct output in some class space. E.g. you detect 10 different objects and you want to know which object is the most likely one in there. Then softmax is good because of its proberty that the whole layer sums up to 1.

Semantic segmentation on the other hand segments the image in some way. I have done semantic medical segmentation and there the output is a binary image. This means you can have sigmoid as output to predict if this pixel belongs to this specific class, because sigmoid values are between 0 and 1 for each output class.

answered Oct 05 '22 19:10

Thomas Pinetz

In general Softmax is used (Softmax Classifier) when ‘n’ number of classes are there. Sigmoid or softmax both can be used for binary (n=2) classification.

Sigmoid: S(x) = 1/ ( 1+ ( e^(-x) ))

Softmax:

         σ(x)j = e    /  **Σ**{k=1 to K} e^zk    for(j=1.....K)

Softmax is kind of Multi Class Sigmoid, but if you see the function of Softmax, the sum of all softmax units are supposed to be 1. In sigmoid it’s not really necessary.

Digging deep, you can also use sigmoid for multi-class classification. When you use a softmax, basically you get a probability of each class, (join distribution and a multinomial likelihood) whose sum is bound to be one. In case you use sigmoid for multi class classification, it’d be like a marginal distribution and a Bernoulli likelihood, p(y0/x) , p(y1/x) etc

answered Oct 05 '22 19:10

Stephen

Related questions
                            
                                Tensorflow: how to close tensorboard server
                            
                                tflearn / tensorflow does not learn xor
                            
                                Training Tensorflow Inception-v3 Imagenet on modest hardware setup
                            
                                How to merge not all summaries in tensorflow?
                            
                                keep_prob in TensorFlow MNIST tutorial
                            
                                HOW TO: Import TensorFlow in Jupyter Notebook from Conda with GPU support?
                            
                                Input images with dynamic dimensions in Tensorflow-lite
                            
                                Training a simple model in Tensorflow GPU slower than CPU
                            
                                Does Google Tensorflow support OpenCL
                            
                                Why do I get AttributeError: module 'tensorflow' has no attribute 'placeholder'?
                            
                                sum over a list of tensors in tensorflow
                            
                                What is the reason to use parameter server in distributed tensorflow learning?
                            
                                Minimize a function of one variable in Tensorflow
                            
                                tensorflow: Not creating XLA devices, tf_xla_enable_xla_devices not set
                            
                                How do I list certain variables in the checkpoint?
                            
                                Convert python opencv mat image to tensorflow image data
                            
                                How to enlarge a tensor(duplicate value) in tensorflow?
                            
                                How can I view weights in a .tflite file?
                            
                                Tensorboard AttributeError: 'ModelCheckpoint' object has no attribute 'on_train_batch_begin'
                            
                                Import OpenCV Mat into C++ Tensorflow without copying

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

softmax and sigmoid function for the output layer

Tags:

tensorflow

deep-learning

computer-vision

keras

theano