Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate the error using a sigmoid function in backpropagation

I have a quick question regarding backpropagation. I am looking at the following:

http://www4.rgu.ac.uk/files/chapter3%20-%20bp.pdf

In this paper, it says to calculate the error of the neuron as

Error = Output(i) * (1 - Output(i)) * (Target(i) - Output(i))

I have put the part of the equation that I don't understand in bold. In the paper, it says that the Output(i) * (1 - Output(i)) term is needed because of the sigmoid function - but I still don't understand why this would be nessecary.

What would be wrong with using

Error = abs(Output(i) - Target(i))

?

Is the error function regardless of the neuron activation/transfer function?

like image 611
Sherlock Avatar asked Jul 24 '12 09:07

Sherlock


People also ask

What is the problem with sigmoid during backpropagation?

During backpropagation, this local gradient is multiplied with the gradient of this gates' output. Thus, if the local gradient is very small, it'll kill the the gradient and the network will not learn. This problem of vanishing gradient is solved by ReLU.

What is sigmoid function in backpropagation?

The Sigmoid function is often used as an activation function in the various layers of a neural network. Put shortly, this means that it determines if a node should be activated or not, and thereby if the node should contribute to the calculations of the network or not.

Is sigmoid an error function?

Another classic sigmoid is the “error function” (or erf). It's sharper than tanh and approaches the asymptotes much more closely for large inputs. One application of erf is efficient computation of the convolution of the Gaussian filter with a box, the 1D analog of a Gaussian blur applied to a rectangle.

What is sigmoid function in neural network?

A sigmoid unit in a neural network. When the activation function for a neuron is a sigmoid function it is a guarantee that the output of this unit will always be between 0 and 1. Also, as the sigmoid is a non-linear function, the output of this unit would be a non-linear function of the weighted sum of inputs.


Video Answer


1 Answers

The reason you need this is that you are calculating the derivative of the error function with respect to the neuron's inputs.

When you take the derivative via the chain rule, you need to multiply by the derivative of the neuron's activation function (which happens to be a sigmoid)

Here's the important math.

Calculate the derivative of the error on the neuron's inputs via the chain rule:

E = -(target - output)^2

dE/dinput = dE/doutput * doutput/dinput

Work out doutput/dinput:

output = sigmoid (input)

doutput/dinput = output * (1 - output)    (derivative of sigmoid function)

therefore:

dE/dinput = 2 * (target - output) * output * (1 - output)
like image 128
mikera Avatar answered Oct 10 '22 02:10

mikera