Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tf 2.0 : RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes

In tf 2.0 DC Gan example in tensorflow 2.0 guide, there are two gradient tapes . See below.

@tf.function
def train_step(images):
    noise = tf.random.normal([BATCH_SIZE, noise_dim])

    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
      generated_images = generator(noise, training=True)

      real_output = discriminator(images, training=True)
      fake_output = discriminator(generated_images, training=True)

      gen_loss = generator_loss(fake_output)
      disc_loss = discriminator_loss(real_output, fake_output)

    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

    generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
    discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))

As you can see clearly that there are two gradient tapes. I was wondering what difference does using a single tape make and changed it to the following

@tf.function
def train_step(images):
    noise = tf.random.normal([BATCH_SIZE, noise_dim])

    with tf.GradientTape() as tape:
      generated_images = generator(noise, training=True)

      real_output = discriminator(images, training=True)
      fake_output = discriminator(generated_images, training=True)

      gen_loss = generator_loss(fake_output)
      disc_loss = discriminator_loss(real_output, fake_output)

    gradients_of_generator = tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_discriminator = tape.gradient(disc_loss, discriminator.trainable_variables)

    generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
    discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))

This gives me the following error :

RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes.

I would like to know why two tapes are necessary. As of now the documentation on tf2.0 APIs is scanty. Can anyone explain or point me to the right docs/tutorials?

like image 710
Himaprasoon Avatar asked May 10 '19 07:05

Himaprasoon


2 Answers

From the documentation of GradientTape:

By default, the resources held by a GradientTape are released as soon as GradientTape.gradient() method is called. To compute multiple gradients over the same computation, create a persistent gradient tape. This allows multiple calls to the gradient() method as resources are released when the tape object is garbage collected.

A persistent gradient can be created with with tf.GradientTape(persistent=True) as tape and can/should be manually deleted with del tape (credits for this @zwep, @Crispy13).

like image 127
Sparky05 Avatar answered Nov 03 '22 18:11

Sparky05


The technical reason is that gradient is called twice, which is not allowed on (non-persistent) tapes.

In the present case however, the underlying reason is that training of GANS is typically done by alternating the optimization of the generator and the discriminator. Each optimization has its own optimizer which typically operate on different variables, and nowadays even the loss that is minimized is different (gen_loss and disc_loss in your code).

So you end up with two gradients because training GANs is essentially optimizing two different (adversarial) problems in an alternating fashion.

like image 2
P-Gn Avatar answered Nov 03 '22 20:11

P-Gn