Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras Dropout with noise_shape

I have a question about Keras function Dropout with the argument of noise_shape.

Question 1:

What's the meaning of if your inputs have shape (batch_size, timesteps, features) and you want the dropout mask to be the same for all timesteps, you can use noise_shape=(batch_size, 1, features)?, and what's the benefit of adding this argument?

Does it mean the number of neurons that will be dropped out is same along time step? which means at every timestep t, there would be n neurons dropped?

Question 2: Do I have to include 'batch_size' in noise_shape when creating models? --> see the following example.

Suppose I have a multivariate time series data in the shape of (10000, 1, 100, 2) --> (number of data, channel, timestep, number of features).

Then I create batches with batch size of 64 --> (64, 1, 100, 2)

If I want to create a CNN model with drop out, I use Keras functional API:

inp = Input([1, 100, 2])
conv1 = Conv2D(64, kernel_size=(11,2), strides(1,1),data_format='channels_first')(inp)
max1 = MaxPooling2D((2,1))(conv1)
max1_shape = max1._keras_shape
drop1 = Dropout((0.1, noise_shape=[**?**, max1._keras_shape[1], 1, 1]))

Because the output shape of layer max1 should be (None, 64, 50, 1), and I cannot assign None to the question mark (which corresponds to batch_size)

I wonder how should I cope with this? Should I just use (64, 1, 1) as noise_shape? or should I define a variable called 'batch_size', then pass it to this argument like (batch_size, 64, 1, 1)?

like image 647
yihao.fu Avatar asked Oct 05 '17 11:10

yihao.fu


People also ask

Does dropout have trainable parameters?

A dropout layer does not have any trainable parameters, they can be updated during training. Dropout is a technique which is used to get hold of Overfitting. This method takes input between 0 and 1.

How do you implement dropout in Keras?

Dropout Regularization in Keras Dropout is easily implemented by randomly selecting nodes to be dropped out with a given probability (e.g., 20%) in each weight update cycle. This is how Dropout is implemented in Keras.

How do you define a dropout in Keras?

The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged.

How do you change the dropout rate?

A good rule of thumb is to divide the number of nodes in the layer before dropout by the proposed dropout rate and use that as the number of nodes in the new network that uses dropout. For example, a network with 100 nodes and a proposed dropout rate of 0.5 will require 200 nodes (100 / 0.5) when using dropout.


1 Answers

Question 1:

It's kind of like a numpy broadcast I think.

Imagine you have 2 batches witch 3 timesteps and 4 features (It's a small example to make it easier to show it): (2, 3, 4)

If you use a noise shape of (2, 1, 4), each batch will have its own dropout mask that will be applied to all timesteps.

So let's say these are the weights of shape (2, 3, 4):

array([[[  1,   2,   3,   4],
        [  5,   6,   7,   8],
        [ 10,  11,  12,  13]],

       [[ 14,  15,  16,  17],
        [ 18,  19,  20,  21],
        [ 22,  23,  24,  25]]])

And this would be the random noise_shape (2, 1, 4) (1 is like keep and 0 is like turn it off):

array([[[ 1,  1,  1,  0]],

       [[ 1,  0,  0,  1]]])

So you have these two noise shapes (For every batch one). Then it will be kinda broadcast along the timestep axis.

array([[[ 1,  1,  1,  0],
        [ 1,  1,  1,  0],
        [ 1,  1,  1,  0]],

       [[ 1,  0,  0,  1],
        [ 1,  0,  0,  1],
        [ 1,  0,  0,  1]]])

and applied to the weights:

array([[[  1,   2,   3,   0],
        [  5,   6,   7,   0],
        [ 10,  11,  12,   0]],

       [[ 14,   0,   0,  17],
        [ 18,   0,   0,  21],
        [ 22,   0,   0,  25]]])

Question 2:

I'm not sure about your second question to be honest.

Edit: What you can do is take the first dimension of the shape of the input, which should be the batch_size, as proposed in this github issue:

import tensorflow as tf

...

batch_size = tf.shape(inp)[0]
drop1 = Dropout((0.1, noise_shape=[batch_size, max1._keras_shape[1], 1, 1]))

As you can see I'm on tensorflow backend. Dunno if theano also has these problems and if it does you might just be able to solve it with the theano shape equivalent.

like image 79
Nima Mousavi Avatar answered Sep 21 '22 17:09

Nima Mousavi