Given a simple mini-batch gradient descent problem on mnist in tensorflow (like in this tutorial), how can I retrieve the gradients for each example in the batch individually.
tf.gradients()
seems to return gradients averaged over all examples in the batch. Is there a way to retrieve gradients before aggregation?
Edit: A first step towards this answer is figuring out at which point tensorflow averages the gradients over the examples in the batch. I thought this happened in _AggregatedGrads, but that doesn't appear to be the case. Any ideas?
tf.gradients
returns the gradient with respect to the loss. This means that if your loss is a sum of per-example losses, then the gradient is also the sum of per-example loss gradients.
The summing up is implicit. For instance if you want to minimize the sum of squared norms of Wx-y
errors, the gradient with respect to W
is 2(WX-Y)X'
where X
is the batch of observations and Y
is the batch of labels. You never explicitly form "per-example" gradients that you later sum up, so it's not a simple matter of removing some stage in the gradient pipeline.
A simple way to get k
per-example loss gradients is to use batches of size 1 and do k
passes. Ian Goodfellow wrote up how to get all k
gradients in a single pass, for this you would need to specify gradients explicitly and not rely on tf.gradients
method
To partly answer my own question after tinkering with this for a while. It appears that it is possible to manipulate gradients per example while still working in batch by doing the following:
custagg_gradients(
ys=[cross_entropy[i] for i in xrange(batch_size)],
xs=variables.trainable_variables(),
aggregation_method=CUSTOM,
gradient_factors=gradient_factors
)
But this will probably have the same complexity as doing individual passes per example, and I need to check if the gradients are correct :-).
One way of retrieving gradients before aggregation is to use the grads_ys
parameter. A good discussion is found here:
Use of grads_ys parameter in tf.gradients - TensorFlow
EDIT:
I haven't been working with Tensorflow a lot lately, but here is an open issue tracking the best way to compute unaggregated gradients:
https://github.com/tensorflow/tensorflow/issues/675
There is a lot of sample code solutions provided by users (including myself) that you can try based on your needs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With