Theano function's parameter updates take a list of pair, in which each pair specifys a shared symbolic variable and its new expression after the calculating the function outputs. I wonder whether there is any order for the updating procedure. The order will matters if two symbolic variable's new expression relies on each other and the updating procedure will use the updated symbolic variable for updating other symbolic variables that rely on it. For examples, this list might looks like this,
[(a, b + a), (b, b+ 1)]
I have written some function to test this. The result seems to show that it always use the old value in the expression (second term in the pair) to update the symbolic variable in the first term, i.e.,
a_new = b_old + a_old
b_new = b_old + 1
Is this a defined behavior?
However I found the implementation of momentum here, Here are the codes for generating the update list and param_update symbolic variables
param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable)
updates += [(param, param - learning_rate*param_update),
(param_update, momentum * param_update + (1. - momentum)*T.grad(cost, param))
Then in the first iteration, the param will not be updated, because param_updates are all zero. In my understanding, param_update should be updated first, and then use that for updating param.
For the update, it always use the previous value (the value before the Theano function call). So you found the right thing.
For momentum, I think it is normal that there is a delay.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With