What is the concept behind taking the derivative? It's interesting that for somehow teaching a system, we have to adjust its weights. But why are we doing this using a derivation of the transfer function. What is in derivation that helps us. I know derivation is the slope of a continuous function at a given point, but what does it have to do with the problem.
You must already know that the cost function is a function with the weights as the variables. For now consider it as f(W).
Our main motive here is to find a W for which we get the minimum value for f(W).
One of the ways for doing this is to plot function f in one axis and W in another....... but remember that here W is not just a single variable but a collection of variables.
So what can be the other way? It can be as simple as changing values of W and see if we get a lower value or not than the previous value of W.
But taking random values for all the variables in W can be a tedious task.
So what we do is, we first take random values for W and see the output of f(W) and the slope at all the values of each variable(we get this by partially differentiating the function with the i'th variable and putting the value of the i'th variable).
now once we know the slope at that point in space we move a little further towards the lower side in the slope (this little factor is termed alpha in gradient descent) and this goes on until the slope gives a opposite value stating we already reached the lowest point in the graph(graph with n dimensions, function vs W, W being a collection of n variables).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With