Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to correctly implement dropout for convolution in TensorFlow

According to the original paper on Dropout said regularisation method can be applied to convolution layers often improving their performance. TensorFlow function tf.nn.dropout supports that by having a noise_shape parameter to allow the user to choose which parts of the tensors will drop out independently. However, neither the paper nor the documentation give a clear explanation of which dimensions should be kept independently, and the TensorFlow explanation of how noise_shape works is rather unclear.

only dimensions with noise_shape[i] == shape(x)[i] will make independent decisions.

I would assume that for a typical CNN layer output of the shape [batch_size, height, width, channels] we don't want individual rows or columns to drop out by themselves, but rather whole channels (which would be equivalent to a node in a fully connected NN) independently of the examples (i.e. different channels could be dropped for different examples in a batch). Am I correct in this assumption?

If so, how would one go about implementing dropout with such specificity using the noise_shape parameter? Would it be:

noise_shape=[batch_size, 1, 1, channels]

or:

noise_shape=[1, height, width, 1]
like image 308
Bruno KM Avatar asked Oct 29 '22 04:10

Bruno KM


1 Answers

from here,

For example, if shape(x) = [k, l, m, n] and noise_shape = [k, 1, 1, n], each batch and channel component will be kept independently and each row and column will be kept or not kept together.

The code may help explain this.

noise_shape = noise_shape if noise_shape is not None else array_ops.shape(x)
# uniform [keep_prob, 1.0 + keep_prob)
random_tensor = keep_prob
random_tensor += random_ops.random_uniform(noise_shape,
                                           seed=seed,
                                           dtype=x.dtype)
# 0. if [keep_prob, 1.0) and 1. if [1.0, 1.0 + keep_prob)
binary_tensor = math_ops.floor(random_tensor)
ret = math_ops.div(x, keep_prob) * binary_tensor
ret.set_shape(x.get_shape())
return ret

the line random_tensor += supports broadcast. When the noise_shape[i] is set to 1, that means all elements in this dimension will add the same random value ranged from 0 to 1. So when noise_shape=[k, 1, 1, n], each row and column in the feature map will be kept or not kept together. On the other hand, each example (batch) or each channel receives different random values and each of them will be kept independently.

like image 146
LI Xuhong Avatar answered Nov 15 '22 10:11

LI Xuhong