I am working on a project with Wasserstein GANs and more specifically with an implementation of the improved version of Wasserstein GANs. I have two theoretical questions about wGANs regarding their stability and training process. Firstly, the result of the loss function notoriously is correlated with the quality of the result of the generated samples (that is stated here). Is there some extra bibliography that supports that argument? Secondly, during my experimental phase, I noticed that training my architecture using wGANs is much faster than using a simple version of GANs. Is that a common behavior? Is there also some literature analysis about that? Furthermore, one question about the continuous functions that are guaranteed by using Wasserstein loss. I am having some issues understanding this concept in practice, what it means that the normal GANs loss is not continuous function?

<ol> <li>You can check Inception Score and Frechet Inception Distance for now. And also here. The problem is that GANs not having a unified objective functions(there are two networks) there's no agreed way of evaluating and comparing GAN models. INstead people devise metrics that's relating the image distributinos and generator distributions.</li> <li>wGAN could be faster due to having morestable training procedures as opposed to vanilla GAN(Wasserstein metric, weight clipping and gradient penalty(if you are using it) ) . I dont know if there's a literature analysis for speed and It may not always the case for WGAN faster than a simple GAN. WGAN cannot find the best Nash equlibirum like GAN.</li> <li>Think two distributions: p and q. If these distributions overlap, i.e. , their domains overlap, then KL or JS divergence are differentiable. The problem arises when p and q don't overlap. As in WGAN paper example, say two pdfs on 2D space, V = (0, Z) , Q = (K , Z) where K is different from 0 and Z is sampled from uniform distribution. If you try to take derivative of KL/JS divergences of these two pdfs well you cannot. This is because these two divergence would be a binary indicator function (equal or not) and we cannot take derivative of these functions. However, if we use Wasserstein loss or Earth-Mover distance, we can take it since we are approximating it as a distance between two points on space. Short story: Normal GAN loss function is continuous iff the distributions have an overlap, otherwise it is discrete.</li> </ol> Hope this helps

Training stability of Wasserstein GANs

Tags:

python

neural-network

keras

I am working on a project with Wasserstein GANs and more specifically with an implementation of the improved version of Wasserstein GANs. I have two theoretical questions about wGANs regarding their stability and training process. Firstly, the result of the loss function notoriously is correlated with the quality of the result of the generated samples (that is stated here). Is there some extra bibliography that supports that argument?

Secondly, during my experimental phase, I noticed that training my architecture using wGANs is much faster than using a simple version of GANs. Is that a common behavior? Is there also some literature analysis about that?

Furthermore, one question about the continuous functions that are guaranteed by using Wasserstein loss. I am having some issues understanding this concept in practice, what it means that the normal GANs loss is not continuous function?

950

asked Apr 06 '20 18:04

Jose Ramon

1 Answers

You can check Inception Score and Frechet Inception Distance for now. And also here. The problem is that GANs not having a unified objective functions(there are two networks) there's no agreed way of evaluating and comparing GAN models. INstead people devise metrics that's relating the image distributinos and generator distributions.
wGAN could be faster due to having morestable training procedures as opposed to vanilla GAN(Wasserstein metric, weight clipping and gradient penalty(if you are using it) ) . I dont know if there's a literature analysis for speed and It may not always the case for WGAN faster than a simple GAN. WGAN cannot find the best Nash equlibirum like GAN.
Think two distributions: p and q. If these distributions overlap, i.e. , their domains overlap, then KL or JS divergence are differentiable. The problem arises when p and q don't overlap. As in WGAN paper example, say two pdfs on 2D space, V = (0, Z) , Q = (K , Z) where K is different from 0 and Z is sampled from uniform distribution. If you try to take derivative of KL/JS divergences of these two pdfs well you cannot. This is because these two divergence would be a binary indicator function (equal or not) and we cannot take derivative of these functions. However, if we use Wasserstein loss or Earth-Mover distance, we can take it since we are approximating it as a distance between two points on space. Short story: Normal GAN loss function is continuous iff the distributions have an overlap, otherwise it is discrete.

Hope this helps

129

answered Oct 01 '22 05:10

Emir Ceyani

Related questions
                            
                                Getting a numpy array view with integer or boolean indexing
                            
                                Keras sees my GPU but doesn't use it when training a neural network
                            
                                Getting pipenv internal error while trying to run this command " pipenv lock " with pycharm in mac OS
                            
                                unable to download the pipeline provided by spark-nlp library
                            
                                VSCode 1.39.x & Python 3.7.x: "ImportError: attempted relative import with no known parent package" - when started without debugging (CTRL+F5))
                            
                                Tensorflow: model wrapper that can release GPU resources
                            
                                Why do I get 'ValueError: NaTType does not support strftime' even though it's not empty?
                            
                                Itertools zip_longest with first item of each sub-list as padding values in stead of None by default
                            
                                Packing values into a tuple using *, just like function argument packing
                            
                                Is there a standard way to fail pytest if test coverage falls under x%
                            
                                Can conda-forge have priority over defaults while still installing MKL versions of packages?
                            
                                Python typing: typed dictionary or defaultdict extending classes
                            
                                How to avoid poor performance of pandas mean() with datetime columns
                            
                                How to use deep learning models for time-series forecasting?
                            
                                Include minimum pip version in setup.py
                            
                                How to make conda-build work correctly and find the setup.py?
                            
                                Animation of tangent line of a 3D curve
                            
                                os.link() vs. os.rename() vs. os.replace() for writing atomic write files. What is the best approach?
                            
                                Reasons for differences in memory consumption and performances of np.zeros and np.full
                            
                                Find Fraction using LP

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With