I am migrating my training loop to Tensorflow 2.0 API. In eager execution mode, tf.GradientTape
replaces tf.gradients
. The question is, do they have the same functionality? Specifically:
In function gradient()
:
output_gradients
equivalent to grad_ys
in the old API?colocate_gradients_with_ops
. aggregation_method
, gate_gradients
of tf.gradients
? Are they deprecated due to lack of use? Can they be replaced by using other methods in 2.0 API? Are they needed in Eager Execution at all?Is function jacobian()
equivalent to tf.python.ops.parallel_for.gradients
?
Please find the response below.
Output Gradients
and grad_ys
: Yes, they can be considered same. Detailed Explanation: Info about Output Gradients
is mentioned in Github -> imperative_grad.py as shown below.
output_gradients: if not None, a list of gradient provided for each Target, or None if we are to use the target's computed downstream gradient,
Info about grad_ys
is mentioned in TF Site as shown below:
grad_ys: is a list of tensors of the same length as ys that holds the initial gradients for each y in ys. When grad_ys is None, we fill in a tensor of '1's of the shape of y for each y in ys. A user can provide their own initial grad_ys to compute the derivatives using a different initial gradient for each y (e.g., if one wanted to weight the gradient differently for each value in each y).
From the above explanations, and from the below code, mentioned in page 394 of the book, Hands on ML using Scikit-Learn & Tensorflow,
we can conclude that initial value of Theta
can be a Random Value and we can pass that using the parameters, output_gradients
or grad_ys
.
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta")
gradients = tf.gradients(mse, [theta])[0]
training_op = tf.assign(theta, theta - learning_rate * gradients)
colocate_gradients_with_ops
: Yes, it is not needed for Eager Execution as it is related to Control Flow Context of Graphs. Detailed Explanation: colocate_gradients_with_ops
points to the below code mentioned in Github -> ops.py. Control flow Context is related to the concept of Context, which is related to Graphs, as explained in TF Site -> Graphs
def _colocate_with_for_gradient(self, op, gradient_uid,
ignore_existing=False):
with self.colocate_with(op, ignore_existing):
if gradient_uid is not None and self._control_flow_context is not None:
self._control_flow_context.EnterGradientColocation(op, gradient_uid)
try:
yield
finally:
self._control_flow_context.ExitGradientColocation(op, gradient_uid)
else:
yield
Regarding aggregation_method
: The equivalent of this parameter has been implemented in 2.0, named _aggregate_grads
as shown in Github link
Regarding gate_gradients
: Not needed for Eager as this also is related to Graph Context.
Detailed Explanation: As shown in the below code from Github -> gradients_utils.py, if gate_gradients
is True
, then some operations are added to graph using the function, _colocate_with_for_gradient
, which in turn depends on Control Flow Context of Graphs.
if gate_gradients and len([x for x in in_grads
if x is not None]) > 1:
with ops.device(None):
with ops._colocate_with_for_gradient( # pylint: disable=protected-access
None,
gradient_uid,
ignore_existing=True):
in_grads = control_flow_ops.tuple(in_grads)
jacobian
: Yes they are same.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With