GAN originally proposed by IJ Goodfellow uses following loss function,
D_loss = - log[D(X)] - log[1 - D(G(Z))]
G_loss = - log[D(G(Z))]
So, discriminator tries to minimize D_loss and generator tries to minimize G_loss, where X and Z are training input and noise input respectively. D(.) and G(.) are map for discriminator and generator neural networks respectively.
As original paper says, when GAN is trained for several steps it reaches at a point where neither generator nor discriminator can improve and D(Y) is 0.5 everywhere, Y is some input to the discriminator. In this case, when GAN is sufficiently trained to this point,
D_loss = - log(0.5) - log(1 - 0.5) = 0.693 + 0.693 = 1.386
G_loss = - log(0.5) = 0.693
So, why can we not use D_loss and G_loss values as a metric for evaluating GAN?
If two loss functions deviate away from these ideal values then GAN surely needs to be trained well or architecture needs to designed well. As theorem 1 in the original paper discusses that these are the optimal values for the D_loss and G_loss but then why can't these be used as evaluation metric?
A GAN can have two loss functions: one for generator training and one for discriminator training.
Many loss functions have been developed and evaluated in an effort to improve the stability of training GAN models. The most common is the non-saturating loss, generally, and the Least Squares and Wasserstein loss in larger and more recent GAN models.
The adversarial loss influences whether the generator model can output images that are plausible in the target domain, whereas the L1 loss regularizes the generator model to output images that are a plausible translation of the source image.
The JSD satisfies all properties of the KL, and has the additional perk that DJSD[p,q]=DJSD[q,p]. With this distance metric, the optimal generator for the GAN objective becomces pG=pdata, and the optimal objective value that we can achieve with optimal generators and discriminators G∗(⋅) and D∗G∗(x) is −log4.
I think this question belongs on Cross-Validated, but anyway :
I struggled with this for quite some time, and wondered why the question wasn't asked. What follows is where I'm currently at. Not sure if it'll help you, but it is some of my intuition.
G and D losses are good indicators of failure cases...
Of course, if G loss is a really big number and D is zero, then nothing good is happening in your GAN.
... but not good indicators of performance.
I've trained a bunch of GANs and have almost never seen the "0.5/0.5 case" except on very simple examples. Most of the time, you're happy when outputs D(x) and D(G(z)) (and therefore, the losses) are more or less stable. So don't take these values for "gold standard".
A key intuition I was missing was in simultaneousity of G and D training. At the beginning, sure G is really bad at generating stuff, but D is also really bad at discriminating them. As time passes, G gets better, but D also gets better. So after many epochs, we can think that D is really good at discriminating between fake and real. Therefore, even if G "fools" D only 5% of the time (i.e. D(x)=0.95 and D(G(z))=0.05) then it can mean that G is actually pretty good because it fools sometimes a really good discriminator.
As you know, there are not reliable metrics of image quality besides looking at it for the moment, but I've found that for my usecases, G could produce great images while fooling D only a few % of the time.
A corrolary to this simultaneous training is what's happening at the beginning of the training : You can have D(X)=0.5 and D(G(Z))=0.5, and still have G produce almost random images : it's just that D is not good enough yet to tell them apart from real images.
I see it's been a couple months since you've posted this question. If you've gained intuition in the meantime, I'd be happy to hear it !
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With