Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow 2.0 doesn't compute the gradient

I want to visualize the patterns that a given feature map in a CNN has learned (in this example I'm using vgg16). To do so I create a random image, feed through the network up to the desired convolutional layer, choose the feature map and find the gradients with the respect to the input. The idea is to change the input in such a way that will maximize the activation of the desired feature map. Using tensorflow 2.0 I have a GradientTape that follows the function and then computes the gradient, however the gradient returns None, why is it unable to compute the gradient?

import tensorflow as tf
import matplotlib.pyplot as plt
import time
import numpy as np
from tensorflow.keras.applications import vgg16

class maxFeatureMap():

    def __init__(self, model):

        self.model = model
        self.optimizer = tf.keras.optimizers.Adam()

    def getNumLayers(self, layer_name):

        for layer in self.model.layers:
            if layer.name == layer_name:
                weights = layer.get_weights()
                num = weights[1].shape[0]
        return ("There are {} feature maps in {}".format(num, layer_name))

    def getGradient(self, layer, feature_map):

        pic = vgg16.preprocess_input(np.random.uniform(size=(1,96,96,3))) ## Creates values between 0 and 1
        pic = tf.convert_to_tensor(pic)

        model = tf.keras.Model(inputs=self.model.inputs, 
                               outputs=self.model.layers[layer].output)
        with tf.GradientTape() as tape:
            ## predicts the output of the model and only chooses the feature_map indicated
            predictions = model.predict(pic, steps=1)[0][:,:,feature_map]
            loss = tf.reduce_mean(predictions)
        print(loss)
        gradients = tape.gradient(loss, pic[0])
        print(gradients)
        self.optimizer.apply_gradients(zip(gradients, pic))

model = vgg16.VGG16(weights='imagenet', include_top=False)


x = maxFeatureMap(model)
x.getGradient(1, 24)
like image 304
Will Avatar asked Jul 06 '19 17:07

Will


People also ask

How does TensorFlow compute gradients?

Then, during the backward pass, TensorFlow traverses this list of operations in reverse order to compute gradients. TensorFlow provides the tf.GradientTape API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually tf.Variable s.

Is there a Gaussian log likelihood loss function in TensorFlow?

First of all, the Gaussian log likelihood loss function is not a default one in TensorFlow 2.0 — it is in the Theano library for example [4] — meaning we have to create a custom loss function. More restrictive though: TensorFlow 2.0 requires a loss function to have exactly two arguments, y_true and y_predicted.

Why do we use Auto differentiation in TensorFlow?

Tensorflow 2.0 calculates the partial derivative functions using auto differentiation, so that if the underlying function is too complicated that we cannot find the partial derivative functions explicitly, we still can achieve our goal in assist with computer program. Hope you have understood gradient descent well and keep learning.

What is the pseudo-loss function in TensorFlow?

For continuous control, the pseudo-loss function is simply the negative log of the pdf value multiplied with the reward signal. Several TensorFlow 2.0 update functions only accept custom loss functions with exactly two arguments.


2 Answers

This is a common pitfall with GradientTape; the tape only traces tensors that are set to be "watched" and by default tapes will watch only trainable variables (meaning tf.Variable objects created with trainable=True). To watch the pic tensor, you should add tape.watch(pic) as the very first line inside the tape context.

Also, I'm not sure if the indexing (pic[0]) will work, so you might want to remove that -- since pic has just one entry in the first dimension it shouldn't matter anyway.

Furthermore, you cannot use model.predict because this returns a numpy array, which basically "destroys" the computation graph chain so gradients won't be backpropagated. You should simply use the model as a callable, i.e. predictions = model(pic).

like image 96
xdurch0 Avatar answered Oct 10 '22 00:10

xdurch0


Did you define your own loss function? Did you convert tensor to numpy in your loss function?

As a freshman, I also met the same problem: When using tape.gradient(loss, variables), it turns out None because I convert tensor to numpy array in my own loss function. It seems to be a stupid but common mistake for freshman.

like image 1
LuTan Avatar answered Oct 10 '22 00:10

LuTan