Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generative adversarial networks tanh? [closed]

I was wondering, why in most of the models of GAN (in MNIST at least) I saw, the activation function (for the discriminator and the generator) was tanh ? Isn't ReLu more efficient ? (I always read that for predictive networks)

Thanks!

like image 347
Pusheen_the_dev Avatar asked Jan 05 '17 16:01

Pusheen_the_dev


People also ask

Why is Tanh used in GANs?

This is due to the fact that when generating the images, they are typically normalized to be either in the range [0,1] or [-1,1]. So if you want your output images to be in [0,1] you can use a sigmoid and if you want them to be in [-1,1] you can use tanh.

Why did GAN mode collapse?

Mode collapse happens when the generator fails to achieve Goal #2–and all of the generated samples are very similar or even identical. The generator may “win” by creating one realistic data sample that always fools the discriminator–achieving Goal #1 by sacrificing Goal #2.

Why are GANs so hard to train?

GANs are difficult to train. The reason they are difficult to train is that both the generator model and the discriminator model are trained simultaneously in a game. This means that improvements to one model come at the expense of the other model.

Are GANs globally convergent?

As a result, GANs are lo- cally convergent for small enough learning rates in this case. However, the assumption of absolute continuity is not true for common use cases of GANs, where both distributions may lie on lower dimensional manifolds (Sønderby et al., 2016; Arjovsky & Bottou, 2017).


2 Answers

From the DCGAN paper [Radford et al. https://arxiv.org/pdf/1511.06434.pdf]...

"The ReLU activation (Nair & Hinton, 2010) is used in the generator with the exception of the output layer which uses the Tanh function. We observed that using a bounded activation allowed the model to learn more quickly to saturate and cover the color space of the training distribution. Within the discriminator we found the leaky rectified activation (Maas et al., 2013) (Xu et al., 2015) to work well, especially for higher resolution modeling. This is in contrast to the original GAN paper, which used the maxout activation (Goodfellow et al., 2013)."

It could be that the symmetry of tanh is an advantage here, since the network should be treating darker colours and lighter colours in a symmetric way.

like image 132
Ben Carr Avatar answered Sep 28 '22 03:09

Ben Carr


Sometimes it depends on the range that you want the activations to fall into. Whenever you hear "gates" in ML literature, you'll probably see a sigmoid, which is between 0 and 1. In this case, maybe they want activations to fall between -1 and 1, so they use tanh. This page says to use tanh, but they don't give an explanation. DCGAN uses ReLUs or leaky ReLUs except for the output of the generator. Makes sense - what if half of your embedding becomes zeros? Might be better to have a smoothly varying embedding between -1 and 1.

I'd love to hear someone else's input, as I'm not sure.

like image 37
chris Avatar answered Sep 28 '22 03:09

chris