In tf 2.0 DC Gan example in tensorflow 2.0 guide, there are two gradient tapes . See below.
@tf.function
def train_step(images):
noise = tf.random.normal([BATCH_SIZE, noise_dim])
with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
generated_images = generator(noise, training=True)
real_output = discriminator(images, training=True)
fake_output = discriminator(generated_images, training=True)
gen_loss = generator_loss(fake_output)
disc_loss = discriminator_loss(real_output, fake_output)
gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
As you can see clearly that there are two gradient tapes. I was wondering what difference does using a single tape make and changed it to the following
@tf.function
def train_step(images):
noise = tf.random.normal([BATCH_SIZE, noise_dim])
with tf.GradientTape() as tape:
generated_images = generator(noise, training=True)
real_output = discriminator(images, training=True)
fake_output = discriminator(generated_images, training=True)
gen_loss = generator_loss(fake_output)
disc_loss = discriminator_loss(real_output, fake_output)
gradients_of_generator = tape.gradient(gen_loss, generator.trainable_variables)
gradients_of_discriminator = tape.gradient(disc_loss, discriminator.trainable_variables)
generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
This gives me the following error :
RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes.
I would like to know why two tapes are necessary. As of now the documentation on tf2.0 APIs is scanty. Can anyone explain or point me to the right docs/tutorials?
From the documentation of GradientTape
:
By default, the resources held by a GradientTape are released as soon as GradientTape.gradient() method is called. To compute multiple gradients over the same computation, create a persistent gradient tape. This allows multiple calls to the gradient() method as resources are released when the tape object is garbage collected.
A persistent gradient can be created with with tf.GradientTape(persistent=True) as tape
and can/should be manually deleted with del tape
(credits for this @zwep, @Crispy13).
The technical reason is that gradient
is called twice, which is not allowed on (non-persistent) tapes.
In the present case however, the underlying reason is that training of GANS is typically done by alternating the optimization of the generator and the discriminator. Each optimization has its own optimizer which typically operate on different variables, and nowadays even the loss that is minimized is different (gen_loss
and disc_loss
in your code).
So you end up with two gradients because training GANs is essentially optimizing two different (adversarial) problems in an alternating fashion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With