I am reading this tutorial provided on the home page of Theano documentation
I am not sure about the code given under the gradient descent section.
I have doubts about the for loop.
If you initialize the 'param_update' variable to zero.
param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable)
and then you update its value in the remaining two lines.
updates.append((param, param - learning_rate*param_update))
updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))
Why do we need it?
I guess I am getting something wrong here. Can you guys help me!
The initialization of param_update
using theano.shared(.)
only tells Theano to reserve a variable that will be used by Theano functions. This initialization code is only called once, and will not be used later on to reset the value of param_update
to 0.
The actual value of param_update
will be updated according to the last line
updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))
when train
function that was constructed by having this update dictionary as an argument ([23] in the tutorial):
train = theano.function([mlp_input, mlp_target], cost,
updates=gradient_updates_momentum(cost, mlp.params, learning_rate, momentum))
Each time train
is called, Theano will compute the gradient of the cost
w.r.t. param
and update param_update
to a new update direction according to momentum rule. Then, param
will be updated by following the update direction saved in param_update
with an appropriate learning_rate
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With