Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Guided Back-propagation in TensorFlow

I would like to implement in TensorFlow the technique of "Guided back-propagation" introduced in this Paper and which is described in this recipe .

Computationally that means that when I compute the gradient e.g., of the input wrt. the output of the NN, I will have to modify the gradients computed at every RELU unit. Concretely, the back-propagated signal on those units must be thresholded on zero, to make this technique work. In other words the partial derivative of the RELUs that are negative must be ignored.

Given that I am interested in applying these gradient computations only on test examples, i.e., I don't want to update the model's parameters - how shall I do it?

I tried (unsuccessfully) two things so far:

  1. Use tf.py_func to wrap my simple numpy version of a RELU, which then is eligible to redefine it's gradient operation via the g.gradient_override_map context manager.

  2. Gather the forward/backward values of BackProp and apply the thresholding on those stemming from Relus.

I failed with both approaches because they require some knowledge of the internals of TF that currently I don't have.

Can anyone suggest any other route, or sketch the code?

Thanks a lot.

like image 454
Peter Avatar asked Jul 13 '16 00:07

Peter


2 Answers

The better solution (your approach 1) with ops.RegisterGradient and tf.Graph.gradient_override_map. Together they override the gradient computation for a pre-defined Op, e.g. Relu within the gradient_override_map context using only python code.

@ops.RegisterGradient("GuidedRelu")
def _GuidedReluGrad(op, grad):
    return tf.where(0. < grad, gen_nn_ops._relu_grad(grad, op.outputs[0]), tf.zeros(grad.get_shape()))

...
with g.gradient_override_map({'Relu': 'GuidedRelu'}):
    y = tf.nn.relu(x)

here is the full example implementation of guided relu: https://gist.github.com/falcondai/561d5eec7fed9ebf48751d124a77b087

Update: in Tensorflow >= 1.0, tf.select is renamed to tf.where. I updated the snippet accordingly. (Thanks @sbond for bringing this to my attention :)

like image 129
Falcon Avatar answered Sep 28 '22 03:09

Falcon


The tf.gradients has the grad_ys parameter that can be used for this purpose. Suppose your network has just one relu layer as follows :

before_relu = f1(inputs, params)
after_relu = tf.nn.relu(before_relu)
loss = f2(after_relu, params, targets)

First, compute the derivative up to after_relu.

Dafter_relu = tf.gradients(loss, after_relu)[0]

Then threshold your gradients that you send down.

Dafter_relu_thresholded = tf.select(Dafter_relu < 0.0, 0.0, Dafter_relu)

Compute the actual gradients w.r.t to params.

Dparams = tf.gradients(after_relu, params, grad_ys=Dafter_relu_thresholded)

You can easily extend this same method for a network with many relu layers.

like image 41
keveman Avatar answered Sep 28 '22 04:09

keveman