Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sigmoid output - can it be interpreted as probability?

Sigmoid function outputs a number between 0 and 1. Is this a probability or is it merely a 'yes or no' depending on whether it's above or below 0.5?

Minimal example:

Cats vs dogs binary classification. 0 is cat, 1 is dog.

Can I perform the following interpretation of the sigmoid output values:

  • 0.9 - it's most certainly a dog
  • 0.52 - it's more likely to be a dog than a cat, but still quite unsure
  • 0.5 - completely undecided, could be either a cat or a dog
  • 0.48 - it's more likely to be a cat than a dog, but still quite unsure
  • 0.1 - it's most certainly a cat

Or would this be the right way to interpret the results:

  • 0.9 - it's a dog
  • 0.52 - it's a dog
  • 0.5 - completely undecided, could be either a cat or a dog
  • 0.48 - it's a cat
  • 0.1 - it's a cat

Note how in first case we utilise the numeric value to also express probabilities, while in the second case we completely ignore the probability interpretation and collapse the answers to binary. Which is correct? Can you explain why?


Background context, feel free to skip this:

I've found a number of sources that suggest that yes, sigmoid output can be interpreted as probability:

  • Source yes 1 - (...) sigmoid(z) will yield a value (a probability) between 0 and 1.
  • Source yes 2 - The "output" must come from a function that satisfies the properties of a distribution function in order for us to interpret it as probabilities. (...) The "sigmoid function" satisfies these properties.
  • Source yes 3 - tf.sigmoid(logits) gives you the probabilities.

And a number of sources that suggest contrary, that sigmoid output cannot be interpreted as probabilities:

  • Source no 1 - (...) the raw values cannot necessarily be interpreted as raw probabilities!
  • Source no 2 - Sigmoid (...) is not a probability distribution function
  • Source no (and also yes) 3 - the short answer is no, however, depending on the loss you use, it may be closer to truth than you may think.

(bonus questions, answer to win a car!) Why are there so many contradicting answers? What do these answers differ in? I find it unlikely that it's just a lot of people being completely wrong about it - I'm thinking they're just talking about different cases or some different fundamental assumptions. What's the difference that I'm missing?


I know I can just use a softmax. I also know that sigmoid can be used for non-exclusive multi-class classification (Source multi 1, Source multi 2, Source multi 3) - although even then it's unclear whether such multiple sigmoids output probabilities of various classes or again simply a 'yes or no', but for multiple classes. In my case though, I'm interested in exclusive two-class (binary) classification, and whether sigmoid can be used to determine its probabilities, or should two-class softmax be used.

like image 580
Voy Avatar asked Nov 26 '19 20:11

Voy


People also ask

Can sigmoid be used for probability?

In binary classification, also called logistic regression, the sigmoid function is used to predict the probability of a binary variable.

Why does sigmoid give a probability?

sigmoid(z) will yield a value (a probability) between 0 and 1. Source yes 2 - The "output" must come from a function that satisfies the properties of a distribution function in order for us to interpret it as probabilities. (...) The "sigmoid function" satisfies these properties.

Is sigmoid a probability density function?

Sigmoid curves are also common in statistics as cumulative distribution functions (which go from 0 to 1), such as the integrals of the logistic density, the normal density, and Student's t probability density functions. The logistic sigmoid function is invertible, and its inverse is the logit function.

What is the output of sigmoid function?

Sigmoid function produces similar results to step function in that the output is between 0 and 1. The curve crosses 0.5 at z=0, which we can set up rules for the activation function, such as: If the sigmoid neuron's output is larger than or equal to 0.5, it outputs 1; if the output is smaller than 0.5, it outputs 0.


1 Answers

A sigmoid function is not a probability density function (PDF), as it integrates to infinity. However, it corresponds to the cumulative probability function of the logistic distribution.

Regarding your interpretation of the results, even though the sigmoid is not a PDF, given that its values lie in the interval [0,1], you can still interpret them as a confidence index. With that in mind, I would say that your first interpretation is the most appropriate one, although you are free to implement whichever classifier suits your purposes better.

like image 116
edu_ Avatar answered Sep 18 '22 18:09

edu_