According to A Guide to TF Layers the dropout layer goes after the last dense layer:
dense = tf.layers.dense(input, units=1024, activation=tf.nn.relu)
dropout = tf.layers.dropout(dense, rate=params['dropout_rate'],
training=mode == tf.estimator.ModeKeys.TRAIN)
logits = tf.layers.dense(dropout, units=params['output_classes'])
Doesn't it make more sense to have it before that dense layer, so it learns the mapping from input to output with the dropout effect?
dropout = tf.layers.dropout(prev_layer, rate=params['dropout_rate'],
training=mode ==
dense = tf.layers.dense(dropout, units=1024, activation=tf.nn.relu)
logits = tf.layers.dense(dense, units=params['output_classes'])
It is not an either/or situation. Informally speaking, common wisdom says to apply dropout after dense layers, and not so much after convolutional or pooling ones, so at first glance that would depend on what exactly the prev_layer
is in your second code snippet.
Nevertheless, this "design principle" is routinely violated nowadays (see some interesting relevant discussions in Reddit & CrossValidated); even in the MNIST CNN example included in Keras, we can see that dropout is applied both after the max pooling layer and after the dense one:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25)) # <-- dropout here
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5)) # <-- and here
model.add(Dense(num_classes, activation='softmax'))
So, both your code snippets are valid, and we can easily imagine a third valid option as well:
dropout = tf.layers.dropout(prev_layer, [...])
dense = tf.layers.dense(dropout, units=1024, activation=tf.nn.relu)
dropout2 = tf.layers.dropout(dense, [...])
logits = tf.layers.dense(dropout2, units=params['output_classes'])
As a general advice: tutorials such the one you link to are only trying to get you familiar with the tools and the (very) general principles, so "overinterpreting" the solutions shown is not recommended...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With