I have a quick question regarding backpropagation. I am looking at the following:
http://www4.rgu.ac.uk/files/chapter3%20-%20bp.pdf
In this paper, it says to calculate the error of the neuron as
Error = Output(i) * (1 - Output(i)) * (Target(i) - Output(i))
I have put the part of the equation that I don't understand in bold. In the paper, it says that the Output(i) * (1 - Output(i)) term is needed because of the sigmoid function - but I still don't understand why this would be nessecary.
What would be wrong with using
Error = abs(Output(i) - Target(i))
?
Is the error function regardless of the neuron activation/transfer function?
During backpropagation, this local gradient is multiplied with the gradient of this gates' output. Thus, if the local gradient is very small, it'll kill the the gradient and the network will not learn. This problem of vanishing gradient is solved by ReLU.
The Sigmoid function is often used as an activation function in the various layers of a neural network. Put shortly, this means that it determines if a node should be activated or not, and thereby if the node should contribute to the calculations of the network or not.
Another classic sigmoid is the “error function” (or erf). It's sharper than tanh and approaches the asymptotes much more closely for large inputs. One application of erf is efficient computation of the convolution of the Gaussian filter with a box, the 1D analog of a Gaussian blur applied to a rectangle.
A sigmoid unit in a neural network. When the activation function for a neuron is a sigmoid function it is a guarantee that the output of this unit will always be between 0 and 1. Also, as the sigmoid is a non-linear function, the output of this unit would be a non-linear function of the weighted sum of inputs.
The reason you need this is that you are calculating the derivative of the error function with respect to the neuron's inputs.
When you take the derivative via the chain rule, you need to multiply by the derivative of the neuron's activation function (which happens to be a sigmoid)
Here's the important math.
Calculate the derivative of the error on the neuron's inputs via the chain rule:
E = -(target - output)^2
dE/dinput = dE/doutput * doutput/dinput
Work out doutput/dinput:
output = sigmoid (input)
doutput/dinput = output * (1 - output) (derivative of sigmoid function)
therefore:
dE/dinput = 2 * (target - output) * output * (1 - output)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With