The documentation is not quite clear about this. I suppose the gradients one can obtain by opt.compute_gradients(E, [v])
contain the ∂E/∂x = g(x)
for each element x
of the tensor that v
stores. Does opt.apply_gradients(grads_and_vars)
essentially execute x ← -η·g(x)
, where η
is the learning rate? That would imply that if I want to add a positive additive change p
to the variable, I would need to need to change g(x) ← g(x) - (1/η)p
, e.g. like this:
opt = tf.train.GradientDescentOptimizer(learning_rate=l) grads_and_vars = opt.compute_gradients(loss, var_list) for l, gv in enumerate(grads_and_vars): grads_and_vars[l] = (gv[0] - (1/l) * p, gv[1]) train_op = opt.apply_gradients(grads_and_vars)
Is there a better way to do this?
It returns a list of (gradient, variable) pairs where "gradient" is the gradient for "variable". Note that "gradient" can be a Tensor , an IndexedSlices , or None if there is no gradient for the given variable.
train. GradientDescentOptimizer(0.01). minimize(error) where the training step is defined. It aims to minimise the value of the error Variable, which is defined earlier as the square of the differences (a common error function).
Optimizers are the extended class, which include added information to train a specific model. The optimizer class is initialized with given parameters but it is important to remember that no Tensor is needed. The optimizers are used for improving speed and performance for training a specific model.
The update rule that the apply_gradients
method actually applies depends on the specific optimizer. Take a look at the implementation of apply_gradients
in the tf.train.Optimizer
class here. It relies on the derived classes implementing the update rule in the methods _apply_dense
and _apply_spares
. The update rule you are referring to is implemented by the GradientDescentOptimizer
.
Regarding your desired positive additive update: If what you are calling opt
is an instantiation of GradientDescentOptimizer
, then you could indeed achieve what you want to do by
grads_and_vars = opt.compute_gradients(E, [v]) eta = opt._learning_rate my_grads_and_vars = [(g-(1/eta)*p, v) for g, v in grads_and_vars] opt.apply_gradients(my_grads_and_vars)
The more elegant way to do this is probably to write a new optimizer (inheriting from tf.train.Optimizer
) that implements your desired update rule directly.
You can also use eager execution API.
import tensorflow as tf tf.enable_eager_execution() tfe = tf.contrib.eager optimizer = tf.train.GradientDescentOptimizer(learning_rate) grad = tfe.implicit_gradients(loss) optimizer.apply_gradients(grad(model_fn, val_list))
I will make an instance for it as follow:
import tensorflow as tf tf.enable_eager_exeuction() tfe = tf.contrib.eager W = tfe.Variable(np.random.randn()) b = tfe.Variable(np.random.randn()) def linear_regression(inputs): return inputs * W + b; def MSE(model_fn, inputs, labels): return tf.reduce_sum(tf.pow(model_fn(inputs) - labels, 2)) / (2 * n_samples) optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.001) grad = tfe.implicit_gradients(MSE) optimizer.apply_gradients(grad(linear_regression, train_X, train_Y)) # train_X and train_Y are your input data and label
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With