Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do we take the derivative of the transfer function in calculating back propagation algorithm?

What is the concept behind taking the derivative? It's interesting that for somehow teaching a system, we have to adjust its weights. But why are we doing this using a derivation of the transfer function. What is in derivation that helps us. I know derivation is the slope of a continuous function at a given point, but what does it have to do with the problem.

like image 901
auryndb Avatar asked Oct 19 '22 14:10

auryndb


1 Answers

You must already know that the cost function is a function with the weights as the variables. For now consider it as f(W).

Our main motive here is to find a W for which we get the minimum value for f(W).

One of the ways for doing this is to plot function f in one axis and W in another....... but remember that here W is not just a single variable but a collection of variables.

So what can be the other way? It can be as simple as changing values of W and see if we get a lower value or not than the previous value of W.

But taking random values for all the variables in W can be a tedious task.

So what we do is, we first take random values for W and see the output of f(W) and the slope at all the values of each variable(we get this by partially differentiating the function with the i'th variable and putting the value of the i'th variable).

now once we know the slope at that point in space we move a little further towards the lower side in the slope (this little factor is termed alpha in gradient descent) and this goes on until the slope gives a opposite value stating we already reached the lowest point in the graph(graph with n dimensions, function vs W, W being a collection of n variables).

like image 96
Shrijit Basak Avatar answered Oct 27 '22 00:10

Shrijit Basak