Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

logistic / sigmoid function implementation numerical precision

in scipy.special.expit, logistic function is implemented like the following:

if x < 0
    a = exp(x) 
    a / (1 + a) 
else 
    1 / (1 + exp(-x))

However, I have seen implementations in other languages/frameworks that simply do

1 / (1 + exp(-x))

I am wondering how much benefit the scipy version actually brings.

For very small x, the result approaches to 0. It works even if exp(-x) overflows to Inf.

like image 901
colinfang Avatar asked May 06 '16 14:05

colinfang


People also ask

Does sigmoid ever reach 1?

This is an “s” shaped curve that limits the node's output. That is, the input to the sigmoid is a value between −∞ and + ∞, while its output can only be between 0 and 1.

How do you use the sigmoid function in logistic regression?

The sigmoid function also called a logistic function. So, if the value of z goes to positive infinity then the predicted value of y will become 1 and if it goes to negative infinity then the predicted value of y will become 0.

What is the difference between sigmoid and logistic function?

Sigmoid Function: A general mathematical function that has an S-shaped curve, or sigmoid curve, which is bounded, differentiable, and real. Logistic Function: A certain sigmoid function that is widely used in binary classification problems using logistic regression.

Is sigmoid function differentiable?

A sigmoid function is a bounded, differentiable, real function that is defined for all real input values and has a non-negative derivative at each point and exactly one inflection point. A sigmoid "function" and a sigmoid "curve" refer to the same object.


1 Answers

It's really just for stability - putting in values that are very large in magnitude might return unexpected results otherwise.

If expit was implemented just as 1 / (1 + exp(-x)) then putting a value of -710 into the function would return nan, whereas -709 would give a value close to zero as expected. This is because exp(710) is too big to be a double.

The branching in the code just means that this scenario is avoided.

See also this question and answer on Stack Overflow.

like image 72
Alex Riley Avatar answered Oct 18 '22 22:10

Alex Riley