Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is "gate_gradients" atribute in tensorflow minimize() function in the optimzer class?

This is the link to TF optimizer class https://www.tensorflow.org/versions/r0.12/api_docs/python/train/optimizers

like image 485
Shamane Siriwardhana Avatar asked Apr 04 '17 17:04

Shamane Siriwardhana


2 Answers

GATE_NONE: Take the simple case of a matmul op on two vectors 'x' and 'y'. let the output be L. Now gradient of L wrt x is y and gradient of L wrt y is xT (x transpose). with GATE_NONE it could so happen that the gradient wrt x is applied to modify x before the gradient for y is even calculated. Now when the gradient wrt y is calculated it would be computed equal to modified x which is an error. Of course it won't happen in such a simple case but you could imagine it could happen in more complex/extreme cases

GATE_OP: For each Op, make sure all gradients are computed before they are used. This prevents race conditions for Ops that generate gradients for multiple inputs where the gradients depend on the inputs. (You could see how this prevents the problem of GATE_NONE, though at the price of some parallelism).

GATE_GRAPH: Make sure all gradients for all variables are computed before any one of them is used. This provides the least parallelism but can be useful if you want to process all gradients before applying any of them.(an example of use case is clipping gradients according to global norm before applying them)

like image 106
MiloMinderbinder Avatar answered Oct 20 '22 08:10

MiloMinderbinder


In the same page that you have linked, if you scroll down a little bit, it says:

gate_gradients argument that controls the degree of parallelism during the application of the gradients

like image 43
Ali Avatar answered Oct 20 '22 10:10

Ali