Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does the sigmoid function really matter in Logistic Regression?

I implemented a binary Logistic Regression classifier. Just to play, around I replaced the sigmoid function (1 / 1 + exp(-z)), with tanh. The results were exactly the same, with the same 0.5 threshold for classification and even though tanh is in the range {-1,1} while sigmoid is in the range {0,1}.

Does it really matter that we use the sigmoid function or can any differentiable non-linear function like tanh work?

Thanks.

like image 454
rahulm Avatar asked Feb 02 '14 05:02

rahulm


People also ask

Why do we need sigmoid function in logistic regression?

What is the Sigmoid Function? In order to map predicted values to probabilities, we use the Sigmoid function. The function maps any real value into another value between 0 and 1. In machine learning, we use sigmoid to map predictions to probabilities.

Why sigmoid function is important?

The main reason why we use sigmoid function is because it exists between (0 to 1). Therefore, it is especially used for models where we have to predict the probability as an output. Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice. The function is differentiable.

Is sigmoid function logistic function?

The sigmoid function also called a logistic function. So, if the value of z goes to positive infinity then the predicted value of y will become 1 and if it goes to negative infinity then the predicted value of y will become 0.

Why is sigmoid not preferred?

Not zero-centered: Sigmoid outputs are not zero-centered, which is undesirable because it can indirectly introduce undesirable zig-zagging dynamics in the gradient updates for the weights.


1 Answers

Did you also change the function in the training, or you just used the same training method and then changed the sigmoid to tanh?

I think what has very likely happened is the following. Have a look at the graphs of sigmoid and tanh:

sigmoid: http://www.wolframalpha.com/input/?i=plot+sigmoid%28x%29+for+x%3D%28-1%2C+1%29 tanh: http://www.wolframalpha.com/input/?i=plot+tanh%28x%29+for+x%3D%28-1%2C+1%29

We can see that in the tanh case, the value y = 0.5 is around x = 0.5. In the sigmoid, the x = 0.5 gets us roughly y = 0.62. Therefore, what I think has probably happened now is that your data doesn't contain any point that would fall within this range, hence you get exactly the same results. Try printing the sigmoid values for your data and see if there is any between 0.5 and 0.62.

The reason behind using the sigmoid function is that it is derived from probability and maximum likelihood. While the other functions may work very similarly, they will lack this probabilistic theory background. For details see for example http://luna.cas.usf.edu/~mbrannic/files/regression/Logistic.html or http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf

like image 192
Laky Avatar answered Oct 19 '22 01:10

Laky