Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to accumulate gradients in tensorflow?

Tags:

I have a question similar to this one.

Because I have limited resources and I work with a deep model (VGG-16) - used to train a triplet network - I want to accumulate gradients for 128 batches of size one training example, and then propagate the error and update the weights.

It's not clear to me how do I do this. I work with tensorflow but any implementation/pseudocode is welcome.

like image 320
Hello Lili Avatar asked Oct 16 '17 14:10

Hello Lili


People also ask

What is gradient accumulation?

One solution to this problem is gradient accumulation. The idea is to split up the batch into smaller mini-batches which are run sequentially, while accumulating their results. The accumulated results are used to update the model parameters only at the end of the last mini-batch.

How does gradient work in TensorFlow?

Gradient tapes TensorFlow "records" relevant operations executed inside the context of a tf. GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.

How do you do gradient accumulation PyTorch?

Coding the gradient accumulation part is also ridiculously easy on PyTorch. All you need to do is to store the loss at each batch and then update the model parameters only after a set number of batches that you choose. We hold onto optimizer. step() which updates the parameters for accumulation_steps number of batches.

Does TensorFlow use gradient descent?

At its core, TensorFlow is just an optimized library for tensor operations (vectors, matrices, etc.) and the calculus operations used to perform gradient descent on arbitrary sequences of calculations.


1 Answers

Let's walk through the code proposed in one of the answers you liked to:

## Optimizer definition - nothing different from any classical example opt = tf.train.AdamOptimizer()  ## Retrieve all trainable variables you defined in your graph tvs = tf.trainable_variables() ## Creation of a list of variables with the same shape as the trainable ones # initialized with 0s accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs] zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]  ## Calls the compute_gradients function of the optimizer to obtain... the list of gradients gvs = opt.compute_gradients(rmse, tvs)  ## Adds to each element from the list you initialized earlier with zeros its gradient (works because accum_vars and gvs are in the same order) accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]  ## Define the training step (part with variable value update) train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)]) 

This first part basically adds new variables and ops to your graph which will allow you to

  1. Accumulate the gradient with ops accum_ops in (the list of) variable accum_vars
  2. Update the model weights with ops train_step

Then, to use it when training, you have to follow these steps (still from the answer you linked):

## The while loop for training while ...:     # Run the zero_ops to initialize it     sess.run(zero_ops)     # Accumulate the gradients 'n_minibatches' times in accum_vars using accum_ops     for i in xrange(n_minibatches):         sess.run(accum_ops, feed_dict=dict(X: Xs[i], y: ys[i]))     # Run the train_step ops to update the weights based on your accumulated gradients     sess.run(train_step) 
like image 162
Pop Avatar answered Oct 05 '22 13:10

Pop