I was wondering, why in most of the models of GAN (in MNIST at least) I saw, the activation function (for the discriminator and the generator) was tanh ? Isn't ReLu more efficient ? (I always read that for predictive networks)
Thanks!
This is due to the fact that when generating the images, they are typically normalized to be either in the range [0,1] or [-1,1]. So if you want your output images to be in [0,1] you can use a sigmoid and if you want them to be in [-1,1] you can use tanh.
Mode collapse happens when the generator fails to achieve Goal #2–and all of the generated samples are very similar or even identical. The generator may “win” by creating one realistic data sample that always fools the discriminator–achieving Goal #1 by sacrificing Goal #2.
GANs are difficult to train. The reason they are difficult to train is that both the generator model and the discriminator model are trained simultaneously in a game. This means that improvements to one model come at the expense of the other model.
As a result, GANs are lo- cally convergent for small enough learning rates in this case. However, the assumption of absolute continuity is not true for common use cases of GANs, where both distributions may lie on lower dimensional manifolds (Sønderby et al., 2016; Arjovsky & Bottou, 2017).
From the DCGAN paper [Radford et al. https://arxiv.org/pdf/1511.06434.pdf]...
"The ReLU activation (Nair & Hinton, 2010) is used in the generator with the exception of the output layer which uses the Tanh function. We observed that using a bounded activation allowed the model to learn more quickly to saturate and cover the color space of the training distribution. Within the discriminator we found the leaky rectified activation (Maas et al., 2013) (Xu et al., 2015) to work well, especially for higher resolution modeling. This is in contrast to the original GAN paper, which used the maxout activation (Goodfellow et al., 2013)."
It could be that the symmetry of tanh is an advantage here, since the network should be treating darker colours and lighter colours in a symmetric way.
Sometimes it depends on the range that you want the activations to fall into. Whenever you hear "gates" in ML literature, you'll probably see a sigmoid, which is between 0 and 1. In this case, maybe they want activations to fall between -1 and 1, so they use tanh. This page says to use tanh, but they don't give an explanation. DCGAN uses ReLUs or leaky ReLUs except for the output of the generator. Makes sense - what if half of your embedding becomes zeros? Might be better to have a smoothly varying embedding between -1 and 1.
I'd love to hear someone else's input, as I'm not sure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With