I have a question about Keras function Dropout with the argument of noise_shape. Question 1: What's the meaning of if your inputs have shape (batch_size, timesteps, features) and you want the dropout mask to be the same for all timesteps, you can use noise_shape=(batch_size, 1, features)?, and what's the benefit of adding this argument? Does it mean the number of neurons that will be dropped out is same along time step? which means at every timestep t, there would be n neurons dropped? Question 2: Do I have to include 'batch_size' in noise_shape when creating models? --> see the following example. Suppose I have a multivariate time series data in the shape of (10000, 1, 100, 2) --> (number of data, channel, timestep, number of features). Then I create batches with batch size of 64 --> (64, 1, 100, 2) If I want to create a CNN model with drop out, I use Keras functional API: <pre class="prettyprint"><code>inp = Input([1, 100, 2]) conv1 = Conv2D(64, kernel_size=(11,2), strides(1,1),data_format='channels_first')(inp) max1 = MaxPooling2D((2,1))(conv1) max1_shape = max1._keras_shape drop1 = Dropout((0.1, noise_shape=[**?**, max1._keras_shape[1], 1, 1])) </code></pre> Because the output shape of layer max1 should be (None, 64, 50, 1), and I cannot assign None to the question mark (which corresponds to batch_size) I wonder how should I cope with this? Should I just use (64, 1, 1) as noise_shape? or should I define a variable called 'batch_size', then pass it to this argument like (batch_size, 64, 1, 1)?

Question 1: It's kind of like a numpy broadcast I think. Imagine you have 2 batches witch 3 timesteps and 4 features (It's a small example to make it easier to show it): (2, 3, 4) If you use a noise shape of (2, 1, 4), each batch will have its own dropout mask that will be applied to all timesteps. So let's say these are the weights of shape (2, 3, 4): <pre class="prettyprint"><code>array([[[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 10, 11, 12, 13]], [[ 14, 15, 16, 17], [ 18, 19, 20, 21], [ 22, 23, 24, 25]]]) </code></pre> And this would be the random noise_shape (2, 1, 4) (1 is like keep and 0 is like turn it off): <pre class="prettyprint"><code>array([[[ 1, 1, 1, 0]], [[ 1, 0, 0, 1]]]) </code></pre> So you have these two noise shapes (For every batch one). Then it will be kinda broadcast along the timestep axis. <pre class="prettyprint"><code>array([[[ 1, 1, 1, 0], [ 1, 1, 1, 0], [ 1, 1, 1, 0]], [[ 1, 0, 0, 1], [ 1, 0, 0, 1], [ 1, 0, 0, 1]]]) </code></pre> and applied to the weights: <pre class="prettyprint"><code>array([[[ 1, 2, 3, 0], [ 5, 6, 7, 0], [ 10, 11, 12, 0]], [[ 14, 0, 0, 17], [ 18, 0, 0, 21], [ 22, 0, 0, 25]]]) </code></pre> Question 2: I'm not sure about your second question to be honest. Edit: What you can do is take the first dimension of the shape of the input, which should be the batch_size, as proposed in this github issue: <pre class="prettyprint"><code>import tensorflow as tf ... batch_size = tf.shape(inp)[0] drop1 = Dropout((0.1, noise_shape=[batch_size, max1._keras_shape[1], 1, 1])) </code></pre> As you can see I'm on tensorflow backend. Dunno if theano also has these problems and if it does you might just be able to solve it with the theano shape equivalent.

Keras Dropout with noise_shape

Tags:

python

deep-learning

keras

dropout

I have a question about Keras function Dropout with the argument of noise_shape.

Question 1:

What's the meaning of if your inputs have shape (batch_size, timesteps, features) and you want the dropout mask to be the same for all timesteps, you can use noise_shape=(batch_size, 1, features)?, and what's the benefit of adding this argument?

Does it mean the number of neurons that will be dropped out is same along time step? which means at every timestep t, there would be n neurons dropped?

Question 2: Do I have to include 'batch_size' in noise_shape when creating models? --> see the following example.

Suppose I have a multivariate time series data in the shape of (10000, 1, 100, 2) --> (number of data, channel, timestep, number of features).

Then I create batches with batch size of 64 --> (64, 1, 100, 2)

If I want to create a CNN model with drop out, I use Keras functional API:

Click to copy

inp = Input([1, 100, 2])
conv1 = Conv2D(64, kernel_size=(11,2), strides(1,1),data_format='channels_first')(inp)
max1 = MaxPooling2D((2,1))(conv1)
max1_shape = max1._keras_shape
drop1 = Dropout((0.1, noise_shape=[**?**, max1._keras_shape[1], 1, 1]))

Because the output shape of layer max1 should be (None, 64, 50, 1), and I cannot assign None to the question mark (which corresponds to batch_size)

I wonder how should I cope with this? Should I just use (64, 1, 1) as noise_shape? or should I define a variable called 'batch_size', then pass it to this argument like (batch_size, 64, 1, 1)?

647

asked Oct 05 '17 11:10

yihao.fu

1 Answers

Question 1:

It's kind of like a numpy broadcast I think.

Imagine you have 2 batches witch 3 timesteps and 4 features (It's a small example to make it easier to show it): (2, 3, 4)

If you use a noise shape of (2, 1, 4), each batch will have its own dropout mask that will be applied to all timesteps.

So let's say these are the weights of shape (2, 3, 4):

Click to copy

array([[[  1,   2,   3,   4],
        [  5,   6,   7,   8],
        [ 10,  11,  12,  13]],

       [[ 14,  15,  16,  17],
        [ 18,  19,  20,  21],
        [ 22,  23,  24,  25]]])

And this would be the random noise_shape (2, 1, 4) (1 is like keep and 0 is like turn it off):

Click to copy

array([[[ 1,  1,  1,  0]],

       [[ 1,  0,  0,  1]]])

So you have these two noise shapes (For every batch one). Then it will be kinda broadcast along the timestep axis.

Click to copy

array([[[ 1,  1,  1,  0],
        [ 1,  1,  1,  0],
        [ 1,  1,  1,  0]],

       [[ 1,  0,  0,  1],
        [ 1,  0,  0,  1],
        [ 1,  0,  0,  1]]])

and applied to the weights:

Click to copy

array([[[  1,   2,   3,   0],
        [  5,   6,   7,   0],
        [ 10,  11,  12,   0]],

       [[ 14,   0,   0,  17],
        [ 18,   0,   0,  21],
        [ 22,   0,   0,  25]]])

Question 2:

I'm not sure about your second question to be honest.

Edit: What you can do is take the first dimension of the shape of the input, which should be the batch_size, as proposed in this github issue:

Click to copy

import tensorflow as tf

...

batch_size = tf.shape(inp)[0]
drop1 = Dropout((0.1, noise_shape=[batch_size, max1._keras_shape[1], 1, 1]))

As you can see I'm on tensorflow backend. Dunno if theano also has these problems and if it does you might just be able to solve it with the theano shape equivalent.

answered Sep 21 '22 17:09

Nima Mousavi

Related questions
                            
                                Pickle/dill cannot handle circular references if __hash__ is overridden
                            
                                NameError: name 'hasattr' is not defined - Python3.6, Django1.11, Ubuntu16-17, Apache2.4, mod_wsgi
                            
                                Creating and populating a dictionary in jinja2
                            
                                How to decode color mapping in matplotlib's Colormap?
                            
                                Sphinx does not recognize subfolders
                            
                                How to inverse lemmatization process given a lemma and a token?
                            
                                How to refresh the flask web page?
                            
                                How do I center the outputs on a Python Jupyter notebook?
                            
                                How to overriding model save function when using factory boy?
                            
                                Ansible write info about nodes to local csv file
                            
                                How do I handle migrations as a Django package maintainer?
                            
                                List comprehension in format string? (Python)
                            
                                How to invoke a Python method using its fully qualified name?
                            
                                How can I find the best fit subsequences of a large string?
                            
                                Can't find nan entries using numpy in array of strings
                            
                                Getting zeep.exceptions.ValidationError: Missing element for method that worked with suds
                            
                                Python: Invalid RGBA argument 0.0 color points according to class
                            
                                In airflow can end user pass parameters to keys which are associated with some specific dag
                            
                                Getting the latest files from FTP folder (filename having spaces) in Python
                            
                                random: what is the default seed?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Keras Dropout with noise_shape

Tags:

python

deep-learning

keras

dropout

yihao.fu

People also ask

1 Answers

Nima Mousavi

Recent Activity

Donate For Us