How to accumulate gradients in tensorflow 2.0?

Tags:

I'm training a model with tensorflow 2.0. The images in my training set are of different resolutions. The Model I've built can handle variable resolutions (conv layers followed by global averaging). My training set is very small and I want to use full training set in a single batch.

Since my images are of different resolutions, I can't use model.fit(). So, I'm planning to pass each sample through the network individually, accumulate the errors/gradients and then apply one optimizer step. I'm able to compute loss values, but I don't know how to accumulate the losses/gradients. How can I accumulate the losses/gradients and then apply a single optimizer step?

Code:

for i in range(num_epochs):
    print(f'Epoch: {i + 1}')
    total_loss = 0
    for j in tqdm(range(num_samples)):
        sample = samples[j]
        with tf.GradientTape as tape:
            prediction = self.model(sample)
            loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
        gradients = tape.gradient(loss_value, self.model.trainable_variables)
        self.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))
        total_loss += loss_value

    epoch_loss = total_loss / num_samples
    print(f'Epoch loss: {epoch_loss}')

702

asked Jan 24 '20 09:01

Nagabhushan S N

2 Answers

If I understand correctly from this statement:

How can I accumulate the losses/gradients and then apply a single optimizer step?

@Nagabhushan is trying to accumulate gradients and then apply the optimization on the (mean) accumulated gradient. The answer provided by @TensorflowSupport does not answers it. In order to perform the optimization only once, and accumulate the gradient from several tapes, you can do the following:

for i in range(num_epochs):
    print(f'Epoch: {i + 1}')
    total_loss = 0

    # get trainable variables
    train_vars = self.model.trainable_variables
    # Create empty gradient list (not a tf.Variable list)
    accum_gradient = [tf.zeros_like(this_var) for this_var in train_vars]

    for j in tqdm(range(num_samples)):
        sample = samples[j]
        with tf.GradientTape as tape:
            prediction = self.model(sample)
            loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
        total_loss += loss_value

        # get gradients of this tape
        gradients = tape.gradient(loss_value, train_vars)
        # Accumulate the gradients
        accum_gradient = [(acum_grad+grad) for acum_grad, grad in zip(accum_gradient, gradients)]


    # Now, after executing all the tapes you needed, we apply the optimization step
    # (but first we take the average of the gradients)
    accum_gradient = [this_grad/num_samples for this_grad in accum_gradient]
    # apply optimization step
    self.optimizer.apply_gradients(zip(accum_gradient,train_vars))
        

    epoch_loss = total_loss / num_samples
    print(f'Epoch loss: {epoch_loss}')

Using tf.Variable() should be avoided inside the training loop, since it will produce errors when trying to execute the code as a graph. If you use tf.Variable() inside your training function and then decorate it with "@tf.function" or apply "tf.function(my_train_fcn)" to obtain a graph function (i.e. for improved performance), the execution will rise an error. This happens because the tracing of the tf.Variable function results in a different behaviour than the observed in eager execution (re-utilization or creation, respectively). You can find more info on this in the tensorflow help page.

answered Sep 28 '22 01:09

Ramiro R.C.

In line with the Stack Overflow Answer and the explanation provided in Tensorflow Website, mentioned below is the code for Accumulating Gradients in Tensorflow Version 2.0:

def train(epochs):
  for epoch in range(epochs):
    for (batch, (images, labels)) in enumerate(dataset):
       with tf.GradientTape() as tape:
        logits = mnist_model(images, training=True)
        tvs = mnist_model.trainable_variables
        accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
        zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
        loss_value = loss_object(labels, logits)

       loss_history.append(loss_value.numpy().mean())
       grads = tape.gradient(loss_value, tvs)
       #print(grads[0].shape)
       #print(accum_vars[0].shape)
       accum_ops = [accum_vars[i].assign_add(grad) for i, grad in enumerate(grads)]



    optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))
    print ('Epoch {} finished'.format(epoch))

# Call the above function    
train(epochs = 3)

Complete code can be found in this Github Gist.

answered Sep 28 '22 02:09

Tensorflow Support

Related questions
                            
                                Neural network is not giving the expected output after training in Python
                            
                                Python read pickle protocol 4 error: STACK_GLOBAL requires str
                            
                                where Environment variables for python are saved
                            
                                How to enforce dataclass fields' types? [duplicate]
                            
                                Retrieving data from python's coroutine object
                            
                                mypy "Incompatible import" error for conditional imports
                            
                                Finding duplicates, and uniques of the duplicates in a csv
                            
                                How to get the mean of pandas cut categorical column
                            
                                Hotkey in vs code to switch between python interactive window and active editor?
                            
                                google colab /bin/bash: 'gdrive/My Drive/path/myfile : Permission denied
                            
                                Azure Functions (Python) blob ouput binding. How to set name when name is only part of the input message
                            
                                WARNING:tensorflow:Layer my_model is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2
                            
                                random.SystemRandom().choice() vs random.choice()
                            
                                Resolve argparse alias back to the original command
                            
                                Sphinx not documenting complex Enum classes
                            
                                Using Python Multiprocessing Queue Inside AWS Lambda Function
                            
                                Pytorch: can we use nn.Module layers directly in forward() function?
                            
                                How to correctly handle cancelled tasks in Python's `asyncio.gather`
                            
                                Why use a double square bracket in Pandas?
                            
                                pivot a dataframe by diagonals

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to accumulate gradients in tensorflow 2.0?

Tags:

python

tensorflow

tensorflow2.0

Nagabhushan S N

People also ask

2 Answers

Ramiro R.C.

Tensorflow Support

Recent Activity

Donate For Us