Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow's gradient_override_map function

Can someone explain me gradient_override_map function in TensorFlow? I couldn't understand its usage precisely.

I see code usage as:

with G.gradient_override_map({"Floor": "Identity"}):
    return tf.reduce_mean(SomeVals) * SomeOtherVal

What exactly is happening here? What is Identity?

like image 367
mac_i Avatar asked Dec 30 '16 06:12

mac_i


2 Answers

Both "Floor" and "Identity" are type strings of operations, the former is corresponding to tf.floor while the latter tf.identity. So the function of your code, I guess, is to substitute tf.identity's back-propagated gradient(BPG for short) calculation mechanism for BPG calculation mechanism of tf.floor operations within graph G while passing forward output of tf.reduce_mean. It seems a little weird since in all applications of gradient_override_map I've found so far, the key of op_type_map is always identical to the type string of the operation used to produce an output in the context. By this I mean I'm more familiar with scenarios with tf.floor(SomeVals) returned, instead of tf.reduce_mean(SomeVals).

What gradient_override_map({op_A_type: op_B_type}) does is to replace op_A's BPG calculation mechanism with op_B's while remaining op_A_type's forward propagation calculation mechanism. A common application of gradient_override_map is shown in lahwran's answer.

@tf.RegisterGradient("CustomGrad")
def _const_mul_grad(unused_op, grad):
    return 5.0 * grad

g = tf.get_default_graph()
with g.gradient_override_map({"Identity": "CustomGrad"}):
    output = tf.identity(input, name="Identity")

by

@tf.RegisterGradient("CustomGrad")
def _const_mul_grad(unused_op, grad):
    return 5.0 * grad

the decorator, tf.RegisterGradient("CustomGrad") registers the gradient function defined by _const_mul_grad(unused_op, grad) for a customized op type -- "CustomGrad",

while

g = tf.get_default_graph()
with g.gradient_override_map({"Identity": "CustomGrad"}):
    output = tf.identity(input, name="Identity") 

assures outputs of all operations (in graph g) with string type "Identity" (tf.identity) are as they were whereas BPG calculation mechanism of tf.identitys replaced by BPG calculation mechanism of operation with string type "CustomGrad".

P.S.

  1. The type string of an op corresponds to the OpDef.name field for the proto that defines the operation. To find an op's OpDef.name , please refer to MingXing's answer under this question

  2. It is not necessary to declare the name of tf.identity operation since the arg 'name' in tf.identity is optional.

like image 185
Yilin He Avatar answered Nov 14 '22 09:11

Yilin He


As best as I can tell, gradient_override_map allows you to say "in this context, any time you would use the gradient of X, instead use the gradient of Y". which means you still need the gradient of Y to be the gradient you want to use.

This is an example I've seen floating around while looking for how this works:

@tf.RegisterGradient("CustomGrad")
def _const_mul_grad(unused_op, grad):
    return 5.0 * grad

g = tf.get_default_graph()
with g.gradient_override_map({"Identity": "CustomGrad"}):
    output = tf.identity(input, name="Identity")

cite: https://stackoverflow.com/a/43948872/1102705

RegisterGradient() allows you to register the gradient of a new op you're defining, thereby allowing you to have an op that has the gradient you wanted, and then you can use that op in the gradient override map. It's kind of clunky - you're defining an op with no forward pass.

Something I'm not clear on is whether the name="Identity" is actually necessary.

like image 2
lahwran Avatar answered Nov 14 '22 11:11

lahwran