Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wasserstein loss can be negative?

I'm currently training a WGAN in keras with (approx) Wasserstein loss as below:

def wasserstein_loss(y_true, y_pred):
    return K.mean(y_true * y_pred)

However, this loss can obviously be negative, which is weird to me.

I trained the WGAN for 200 epochs and got the critic Wasserstein loss training curve below. Wasserstein loss training curve

The above loss is calculated by

d_loss_valid = critic.train_on_batch(real, np.ones((batch_size, 1)))
d_loss_fake = critic.train_on_batch(fake, -np.ones((batch_size, 1)))
d_loss, _ = 0.5*np.add(d_loss_valid, d_loss_fake)

The resulting generated sample quality is great, so I think I trained the WGAN correctly. However I still cannot understand why the Wasserstein loss can be negative and the model still works. According to the original WGAN paper, Wasserstein loss can be used as a performance indicator for GAN, so how should we interpret it? Am I misunderstand anything?

like image 249
Piggy Wenzhou Avatar asked Jul 19 '19 01:07

Piggy Wenzhou


2 Answers

The Wasserstein loss is a measurement of Earth-Movement distance, which is a difference between two probability distributions. In tensorflow it is implemented as d_loss = tf.reduce_mean(d_fake) - tf.reduce_mean(d_real) which can obviously give a negative number if d_fake moves too far on the other side of d_real distribution. You can see it on your plot where during the training your real and fake distributions changing sides until they converge around zero. So as a performance measurement you can use it to see how far the generator is from the real data and on which side it is now.

See the distributions plot:

enter image description here

P.S. it's crossentropy loss, not Wasserstein. Perhaps this article can help you more, if you didn't read it yet. However, the other question is how the optimizer can minimize the negative loss (to zero).

like image 159
Sergiy Isakov Avatar answered Oct 12 '22 06:10

Sergiy Isakov


Looks like I cannot make a comment to the answer given by Sergeiy Isakov because I do not have enough reputations. I wanted to comment because I think that information is not correct.

In principle, Wasserstein distance cannot be negative because distance metric cannot be negative. The actual expression (dual form) for Wasserstein distance involves the supremum of all the 1-Lipschitz functions (You can refer to it on the web). Since it is the supremum, we always take that Lipschitz function that gives the largest value to obtain the Wasserstein distance. However, the Wasserstein we compute using WGAN is just an estimate and not really the real Wasserstein distance. If the inner iterations of the critic are low it may not have enough iterations to move to a positive value.

Thought experiment: If we suppose that we obtain a Wasserstein estimate that is negative, we can always negate the critic function to make the estimate positive. That means there exist a Lipschitz function that gives a positive value which is larger than that Lipschitz function that gives negative value. So Wasserstein estimates cannot be negative as by definition we need to have the supremum of all the 1-Lipschitz functions.

like image 3
Sherine Brahma Avatar answered Oct 12 '22 06:10

Sherine Brahma