Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do sigmoid functions work in Neural Nets?

Tags:

I have just started programming for Neural networks. I am currently working on understanding how a Backpropogation (BP) neural net works. While the algorithm for training in BP nets is quite straightforward, I was unable to find any text on why the algorithm works. More specifically, I am looking for some mathematical reasoning to justify using sigmoid functions in neural nets, and what makes them mimic almost any data distribution thrown at them.

Thanks!

like image 730
Anshul Porwal Avatar asked Jul 26 '12 20:07

Anshul Porwal


People also ask

What is the purpose of sigmoid function?

Sigmoid Function acts as an activation function in machine learning which is used to add non-linearity in a machine learning model, in simple words it decides which value to pass as output and what not to pass, there are mainly 7 types of Activation Functions which are used in machine learning and deep learning.

Why do we need to use a sigmoid function when using backpropagation?

The backpropagation learning rule relies on the fact that the sigmoid function is differentiable, which makes it possible to characterize the rate of change in the output layer error with respect to a change in a particular weight (even if the weight is multiple layers away from the output).

Why is the sigmoid function suitable for use in a binary classifier?

As you can see, the sigmoid is a function that only occupies the range from 0 to 1 and it asymptotes both values. This makes it very handy for binary classification with 0 and 1 as potential output values.

What is sigmoid function and why go for sigmoid neurons?

The building block of the deep neural networks is called the sigmoid neuron. Sigmoid neurons are similar to perceptrons, but they are slightly modified such that the output from the sigmoid neuron is much smoother than the step functional output from perceptron.


1 Answers

The sigmoid function introduces non-linearity in the network. Without a non-linear activation function, the net can only learn functions which are linear combinations of its inputs. The result is called universal approximation theorem or Cybenko theorem, after the gentleman who proved it in 1989. Wikipedia is a good place to start, and it has a link to the original paper (the proof is somewhat involved though). The reason why you would use a sigmoid as opposed to something else is that it is continuous and differentiable, its derivative is very fast to compute (as opposed to the derivative of tanh, which has similar properties) and has a limited range (from 0 to 1, exclusive)

like image 124
mbatchkarov Avatar answered Sep 22 '22 03:09

mbatchkarov