I can't understand why dropout works like this in tensorflow. The blog of CS231n says that, <code>"dropout is implemented by only keeping a neuron active with some probability p (a hyperparameter), or setting it to zero otherwise."</code> Also you can see this from picture(Taken from the same site) <img src="https://i.stack.imgur.com/SbXq1.jpg" alt="enter image description here"> From tensorflow site, <code>With probability keep_prob, outputs the input element scaled up by 1 / keep_prob, otherwise outputs 0.</code> Now, why the input element is scaled up by <code>1/keep_prob</code>? Why not keep the input element as it is with probability and not scale it with <code>1/keep_prob</code>?

This scaling enables the same network to be used for training (with <code>keep_prob < 1.0</code>) and evaluation (with <code>keep_prob == 1.0</code>). From the Dropout paper: <blockquote> The idea is to use a single neural net at test time without dropout. The weights of this network are scaled-down versions of the trained weights. If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time as shown in Figure 2. </blockquote> Rather than adding ops to scale down the weights by <code>keep_prob</code> at test time, the TensorFlow implementation adds an op to scale up the weights by <code>1. / keep_prob</code> at training time. The effect on performance is negligible, and the code is simpler (because we use the same graph and treat <code>keep_prob</code> as a <code>tf.placeholder()</code> that is fed a different value depending on whether we are training or evaluating the network).

Let's say the network had <code>n</code> neurons and we applied dropout rate <code>1/2</code> Training phase, we would be left with <code>n/2</code> neurons. So if you were expecting output <code>x</code> with all the neurons, now you will get on <code>x/2</code>. So for every batch, the network weights are trained according to this x/2 Testing/Inference/Validation phase, we dont apply any dropout so the output is x. So, in this case, the output would be with x and not x/2, which would give you the incorrect result. So what you can do is scale it to x/2 during testing. Rather than the above scaling specific to Testing phase. What Tensorflow's dropout layer does is that whether it is with dropout or without (Training or testing), it scales the output so that the sum is constant.

Why input is scaled in tf.nn.dropout in tensorflow?

Tags:

machine-learning

neural-network

tensorflow

deep-learning

I can't understand why dropout works like this in tensorflow. The blog of CS231n says that, "dropout is implemented by only keeping a neuron active with some probability p (a hyperparameter), or setting it to zero otherwise." Also you can see this from picture(Taken from the same site) enter image description here

From tensorflow site, With probability keep_prob, outputs the input element scaled up by 1 / keep_prob, otherwise outputs 0.

Now, why the input element is scaled up by 1/keep_prob? Why not keep the input element as it is with probability and not scale it with 1/keep_prob?

330

asked Jan 04 '16 18:01

Shubhashis

2 Answers

This scaling enables the same network to be used for training (with keep_prob < 1.0) and evaluation (with keep_prob == 1.0). From the Dropout paper:

The idea is to use a single neural net at test time without dropout. The weights of this network are scaled-down versions of the trained weights. If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time as shown in Figure 2.

Rather than adding ops to scale down the weights by keep_prob at test time, the TensorFlow implementation adds an op to scale up the weights by 1. / keep_prob at training time. The effect on performance is negligible, and the code is simpler (because we use the same graph and treat keep_prob as a tf.placeholder() that is fed a different value depending on whether we are training or evaluating the network).

170

answered Oct 04 '22 21:10

mrry

Let's say the network had n neurons and we applied dropout rate 1/2

Training phase, we would be left with n/2 neurons. So if you were expecting output x with all the neurons, now you will get on x/2. So for every batch, the network weights are trained according to this x/2

Testing/Inference/Validation phase, we dont apply any dropout so the output is x. So, in this case, the output would be with x and not x/2, which would give you the incorrect result. So what you can do is scale it to x/2 during testing.

Rather than the above scaling specific to Testing phase. What Tensorflow's dropout layer does is that whether it is with dropout or without (Training or testing), it scales the output so that the sum is constant.

answered Oct 04 '22 21:10

Trideep Rath

Related questions
                            
                                Plot Interactive Decision Tree in Jupyter Notebook
                            
                                Tensorflow: restoring a graph and model then running evaluation on a single image
                            
                                How does keras define "accuracy" and "loss"?
                            
                                Choosing between GeForce or Quadro GPUs to do machine learning via TensorFlow
                            
                                Scikit-learn, get accuracy scores for each class
                            
                                Restore original text from Keras’s imdb dataset
                            
                                Why is weight vector orthogonal to decision plane in neural networks
                            
                                How to insert Keras model into scikit-learn pipeline?
                            
                                Real world typo statistics? [closed]
                            
                                How to serve a Spark MLlib model?
                            
                                What is "epoch" in keras.models.Model.fit?
                            
                                Deep Belief Networks vs Convolutional Neural Networks
                            
                                Recommended package for very large dataset processing and machine learning in R [closed]
                            
                                Can Keras deal with input images with different size?
                            
                                Publicly Available Spam Filter Training Set [closed]
                            
                                setting values for ntree and mtry for random forest regression model
                            
                                What's the difference between scikit-learn and tensorflow? Is it possible to use them together?
                            
                                How Could One Implement the K-Means++ Algorithm?
                            
                                ModuleNotFoundError: No module named 'numpy.testing.nosetester'
                            
                                LSTM Autoencoder

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With