What's the difference between optimizer.compute_gradient() and tf.gradients() in tensorflow?

Question

The following code I've written, fails at self.optimizer.compute_gradients(self.output,all_variables)

import tensorflow as tf
import tensorlayer as tl
from tensorflow.python.framework import ops
import numpy as np

class Network1():


def __init__(self):
    ops.reset_default_graph()
    tl.layers.clear_layers_name()

    self.sess = tf.Session()
    self.optimizer = tf.train.AdamOptimizer(learning_rate=0.1)

    self.input_x = tf.placeholder(tf.float32, shape=[None, 784],name="input")  

    input_layer = tl.layers.InputLayer(self.input_x)        

    relu1 = tl.layers.DenseLayer(input_layer, n_units=800, act = tf.nn.relu, name="relu1")
    relu2 = tl.layers.DenseLayer(relu1, n_units=500, act = tf.nn.relu, name="relu2")

    self.output = relu2.all_layers[-1]
    all_variables = relu2.all_layers

    self.gradient = self.optimizer.compute_gradients(self.output,all_variables)

    init_op = tf.initialize_all_variables()
    self.sess.run(init_op)

with warning,

TypeError: Argument is not a tf.Variable: Tensor("relu1/Relu:0", shape=(?, 800), dtype=float32)

However when I change that line to tf.gradients(self.output,all_variables), the code works fine, at least no warning is reported. Where did I miss, since I think these two methods are actually executing the same thing, that is return a list of (gradient, variable) pairs.

David Wong · Accepted Answer

optimizer.compute_gradients wraps tf.gradients(), as you can see here. It does additional asserts (which explains your error).

amirsina torfi · Answer

I would like to add to the above answer by referring to a simple point. optimizer.compute_gradients return a list of tuples as (grads, vars) pairs. Variables are always there, but the gradients might be None. That makes sense since computing the gradients of specific loss with respect to some of the variables in var_list can be None. It says there is no dependency.

On the other hand, tf.gradients only return the list of sum(dy/dx) for each variable. It MUST be accompanied by the variable list for applying the gradient update.

Henceforth, the following two approaches can be utilized interchangeably:

        ### Approach 1 ###
        variable_list = desired_list_of_variables
        gradients = optimizer.compute_gradients(loss,var_list=variable_list)
        optimizer.apply_gradients(gradients)

        # ### Approach 2 ###
        variable_list = desired_list_of_variables
        gradients = tf.gradients(loss, var_list=variable_list)
        optimizer.apply_gradients(zip(gradients, variable_list))

What's the difference between optimizer.compute_gradient() and tf.gradients() in tensorflow?

Tags:

machine-learning

tensorflow

deep-learning

xxx222

2 Answers

David Wong

amirsina torfi

Recent Activity

Donate For Us

What's the difference between optimizer.compute_gradient() and tf.gradients() in tensorflow?

Tags:

machine-learning

tensorflow

deep-learning

xxx222

2 Answers

David Wong

amirsina torfi

Related questions

Recent Activity

Donate For Us