Sigmoid function outputs a number between 0 and 1. Is this a probability or is it merely a 'yes or no' depending on whether it's above or below 0.5?
Minimal example:
Cats vs dogs binary classification. 0 is cat, 1 is dog.
Can I perform the following interpretation of the sigmoid output values:
Or would this be the right way to interpret the results:
Note how in first case we utilise the numeric value to also express probabilities, while in the second case we completely ignore the probability interpretation and collapse the answers to binary. Which is correct? Can you explain why?
Background context, feel free to skip this:
I've found a number of sources that suggest that yes, sigmoid output can be interpreted as probability:
tf.sigmoid(logits)
gives you the probabilities.
And a number of sources that suggest contrary, that sigmoid output cannot be interpreted as probabilities:
(bonus questions, answer to win a car!) Why are there so many contradicting answers? What do these answers differ in? I find it unlikely that it's just a lot of people being completely wrong about it - I'm thinking they're just talking about different cases or some different fundamental assumptions. What's the difference that I'm missing?
I know I can just use a softmax. I also know that sigmoid can be used for non-exclusive multi-class classification (Source multi 1, Source multi 2, Source multi 3) - although even then it's unclear whether such multiple sigmoids output probabilities of various classes or again simply a 'yes or no', but for multiple classes. In my case though, I'm interested in exclusive two-class (binary) classification, and whether sigmoid can be used to determine its probabilities, or should two-class softmax be used.
In binary classification, also called logistic regression, the sigmoid function is used to predict the probability of a binary variable.
sigmoid(z) will yield a value (a probability) between 0 and 1. Source yes 2 - The "output" must come from a function that satisfies the properties of a distribution function in order for us to interpret it as probabilities. (...) The "sigmoid function" satisfies these properties.
Sigmoid curves are also common in statistics as cumulative distribution functions (which go from 0 to 1), such as the integrals of the logistic density, the normal density, and Student's t probability density functions. The logistic sigmoid function is invertible, and its inverse is the logit function.
Sigmoid function produces similar results to step function in that the output is between 0 and 1. The curve crosses 0.5 at z=0, which we can set up rules for the activation function, such as: If the sigmoid neuron's output is larger than or equal to 0.5, it outputs 1; if the output is smaller than 0.5, it outputs 0.
A sigmoid function is not a probability density function (PDF), as it integrates to infinity. However, it corresponds to the cumulative probability function of the logistic distribution.
Regarding your interpretation of the results, even though the sigmoid is not a PDF, given that its values lie in the interval [0,1], you can still interpret them as a confidence index. With that in mind, I would say that your first interpretation is the most appropriate one, although you are free to implement whichever classifier suits your purposes better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With