What is "gate_gradients" atribute in tensorflow minimize() function in the optimzer class?

Question

This is the link to TF optimizer class https://www.tensorflow.org/versions/r0.12/api_docs/python/train/optimizers

MiloMinderbinder · Accepted Answer

GATE_NONE: Take the simple case of a matmul op on two vectors 'x' and 'y'. let the output be L. Now gradient of L wrt x is y and gradient of L wrt y is xT (x transpose). with GATE_NONE it could so happen that the gradient wrt x is applied to modify x before the gradient for y is even calculated. Now when the gradient wrt y is calculated it would be computed equal to modified x which is an error. Of course it won't happen in such a simple case but you could imagine it could happen in more complex/extreme cases

GATE_OP: For each Op, make sure all gradients are computed before they are used. This prevents race conditions for Ops that generate gradients for multiple inputs where the gradients depend on the inputs. (You could see how this prevents the problem of GATE_NONE, though at the price of some parallelism).

GATE_GRAPH: Make sure all gradients for all variables are computed before any one of them is used. This provides the least parallelism but can be useful if you want to process all gradients before applying any of them.(an example of use case is clipping gradients according to global norm before applying them)

Ali · Answer

In the same page that you have linked, if you scroll down a little bit, it says:

gate_gradients argument that controls the degree of parallelism during the application of the gradients

What is "gate_gradients" atribute in tensorflow minimize() function in the optimzer class?

Tags:

tensorflow

deep-learning

Shamane Siriwardhana

2 Answers

MiloMinderbinder

Ali

Recent Activity

Donate For Us

What is "gate_gradients" atribute in tensorflow minimize() function in the optimzer class?

Tags:

tensorflow

deep-learning

Shamane Siriwardhana

2 Answers

MiloMinderbinder

Ali

Related questions

Recent Activity

Donate For Us