Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unaggregated gradients / gradients per example in tensorflow

Tags:

tensorflow

Given a simple mini-batch gradient descent problem on mnist in tensorflow (like in this tutorial), how can I retrieve the gradients for each example in the batch individually.

tf.gradients() seems to return gradients averaged over all examples in the batch. Is there a way to retrieve gradients before aggregation?

Edit: A first step towards this answer is figuring out at which point tensorflow averages the gradients over the examples in the batch. I thought this happened in _AggregatedGrads, but that doesn't appear to be the case. Any ideas?

like image 881
Bas Avatar asked Mar 01 '16 19:03

Bas


3 Answers

tf.gradients returns the gradient with respect to the loss. This means that if your loss is a sum of per-example losses, then the gradient is also the sum of per-example loss gradients.

The summing up is implicit. For instance if you want to minimize the sum of squared norms of Wx-y errors, the gradient with respect to W is 2(WX-Y)X' where X is the batch of observations and Y is the batch of labels. You never explicitly form "per-example" gradients that you later sum up, so it's not a simple matter of removing some stage in the gradient pipeline.

A simple way to get k per-example loss gradients is to use batches of size 1 and do k passes. Ian Goodfellow wrote up how to get all k gradients in a single pass, for this you would need to specify gradients explicitly and not rely on tf.gradients method

like image 173
Yaroslav Bulatov Avatar answered Oct 13 '22 15:10

Yaroslav Bulatov


To partly answer my own question after tinkering with this for a while. It appears that it is possible to manipulate gradients per example while still working in batch by doing the following:

  • Create a copy of tf.gradients() that accepts an extra tensor/placeholder with example-specific factors
  • Create a copy of _AggregatedGrads() and add a custom aggregation method that uses the example-specific factors
  • Call your custom tf.gradients function and give your loss as a list of slices:

custagg_gradients( ys=[cross_entropy[i] for i in xrange(batch_size)],
xs=variables.trainable_variables(), aggregation_method=CUSTOM, gradient_factors=gradient_factors )

But this will probably have the same complexity as doing individual passes per example, and I need to check if the gradients are correct :-).

like image 37
Bas Avatar answered Oct 13 '22 16:10

Bas


One way of retrieving gradients before aggregation is to use the grads_ys parameter. A good discussion is found here:

Use of grads_ys parameter in tf.gradients - TensorFlow

EDIT:

I haven't been working with Tensorflow a lot lately, but here is an open issue tracking the best way to compute unaggregated gradients:

https://github.com/tensorflow/tensorflow/issues/675

There is a lot of sample code solutions provided by users (including myself) that you can try based on your needs.

like image 20
bremen_matt Avatar answered Oct 13 '22 15:10

bremen_matt