What is the ideal value of loss function for a GAN

Tags:

GAN originally proposed by IJ Goodfellow uses following loss function,

D_loss = - log[D(X)] - log[1 - D(G(Z))]

G_loss = - log[D(G(Z))]

So, discriminator tries to minimize D_loss and generator tries to minimize G_loss, where X and Z are training input and noise input respectively. D(.) and G(.) are map for discriminator and generator neural networks respectively.

As original paper says, when GAN is trained for several steps it reaches at a point where neither generator nor discriminator can improve and D(Y) is 0.5 everywhere, Y is some input to the discriminator. In this case, when GAN is sufficiently trained to this point,

D_loss = - log(0.5) - log(1 - 0.5) = 0.693 + 0.693 = 1.386

G_loss = - log(0.5) = 0.693

So, why can we not use D_loss and G_loss values as a metric for evaluating GAN?

If two loss functions deviate away from these ideal values then GAN surely needs to be trained well or architecture needs to designed well. As theorem 1 in the original paper discusses that these are the optimal values for the D_loss and G_loss but then why can't these be used as evaluation metric?

738

asked Mar 22 '18 04:03

Vinay Joshi

1 Answers

I think this question belongs on Cross-Validated, but anyway :

I struggled with this for quite some time, and wondered why the question wasn't asked. What follows is where I'm currently at. Not sure if it'll help you, but it is some of my intuition.

G and D losses are good indicators of failure cases...
Of course, if G loss is a really big number and D is zero, then nothing good is happening in your GAN.

... but not good indicators of performance.
I've trained a bunch of GANs and have almost never seen the "0.5/0.5 case" except on very simple examples. Most of the time, you're happy when outputs D(x) and D(G(z)) (and therefore, the losses) are more or less stable. So don't take these values for "gold standard".
A key intuition I was missing was in simultaneousity of G and D training. At the beginning, sure G is really bad at generating stuff, but D is also really bad at discriminating them. As time passes, G gets better, but D also gets better. So after many epochs, we can think that D is really good at discriminating between fake and real. Therefore, even if G "fools" D only 5% of the time (i.e. D(x)=0.95 and D(G(z))=0.05) then it can mean that G is actually pretty good because it fools sometimes a really good discriminator.
As you know, there are not reliable metrics of image quality besides looking at it for the moment, but I've found that for my usecases, G could produce great images while fooling D only a few % of the time.
A corrolary to this simultaneous training is what's happening at the beginning of the training : You can have D(X)=0.5 and D(G(Z))=0.5, and still have G produce almost random images : it's just that D is not good enough yet to tell them apart from real images.

I see it's been a couple months since you've posted this question. If you've gained intuition in the meantime, I'd be happy to hear it !

161

answered Nov 15 '22 07:11

Soltius

Related questions
                            
                                Implementing Barabasi-Albert Method for Creating Scale-Free Networks
                            
                                pybrain poor results
                            
                                Accuracy issue in caffe
                            
                                TensorFlow network not training?
                            
                                Tensorflow weights for kernels of convolution for colored images?
                            
                                Tensorflow, py_func, or custom function
                            
                                Get weights from tensorflow model
                            
                                How to implement multi-class hinge loss in tensorflow
                            
                                Should I avoid to use L2 regularization in conjuntion with RMSProp?
                            
                                Why do I have to do two train steps for fine-tuning InceptionV3 in Keras?
                            
                                how to predict my own image using cnn in keras after training on MNIST dataset
                            
                                Keras - get weight of trained layer
                            
                                Keras: model accuracy drops after reaching 99 percent accuracy and loss 0.01
                            
                                Approximating sine function with Neural Network and ReLU
                            
                                Training in batches but testing individual data item in Tensorflow?
                            
                                Imbalanced Dataset Using Keras
                            
                                How to overcome overfitting in CNN - standard methods don't work
                            
                                Mini batch training for inputs of variable sizes
                            
                                How to get summary information on tensorflow RNN
                            
                                How to feed sound as input to neural networks? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the ideal value of loss function for a GAN

Tags:

neural-network

objective-function

loss

generative-adversarial-network

Vinay Joshi

People also ask

1 Answers

Soltius

Recent Activity

Donate For Us