I don't quite understand why a sigmoid function is seen as more useful (for neural networks) than a step function... hoping someone can explain this for me. Thanks in advance.
It depends on the problem you are dealing with. In case of simple binary classification, a step function is appropriate. Sigmoids can be useful when building more biologically realistic networks by introducing noise or uncertainty.
Sigmoid Function: A general mathematical function that has an S-shaped curve, or sigmoid curve, which is bounded, differentiable, and real. Logistic Function: A certain sigmoid function that is widely used in binary classification problems using logistic regression.
The main reason why we use sigmoid function is because it exists between (0 to 1). Therefore, it is especially used for models where we have to predict the probability as an output.Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.
Step Function is one of the simplest kind of activation functions. In this, we consider a threshold value and if the value of net input say y is greater than the threshold then the neuron is activated.
There are steep shifts from 0 to 1, which may not fit the data well. The network is not differentiable, so gradient-based training is impossible.
Disadvantages of Sigmoid functions:It is most prone to gradient vanishing problem. Function output is not zero-centred.
The (Heaviside) step function is typically only useful within single-layer perceptrons, an early type of neural networks that can be used for classification in cases where the input data is linearly separable.
However, multi-layer neural networks or multi-layer perceptrons are of more interest because they are general function approximators and they are able to distinguish data that is not linearly separable.
Multi-layer perceptrons are trained using backpropapagation. A requirement for backpropagation is a differentiable activation function. That's because backpropagation uses gradient descent on this function to update the network weights.
The Heaviside step function is non-differentiable at x = 0 and its derivative is 0 elsewhere. This means gradient descent won't be able to make progress in updating the weights and backpropagation will fail.
The sigmoid or logistic function does not have this shortcoming and this explains its usefulness as an activation function within the field of neural networks.
It depends on the problem you are dealing with. In case of simple binary classification, a step function is appropriate. Sigmoids can be useful when building more biologically realistic networks by introducing noise or uncertainty. Another but compeletely different use of sigmoids is for numerical continuation, i.e. when doing bifurcation analysis with respect to some parameter in the model. Numerical continuation is easier with smooth systems (and very tricky with non-smooth ones).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With