Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dropout behavior in Keras with rate=1 (dropping all input units) not as expected

input0 = keras.layers.Input((32, 32, 3), name='Input0')
flatten = keras.layers.Flatten(name='Flatten')(input0)
relu1 = keras.layers.Dense(256, activation='relu', name='ReLU1')(flatten)
dropout = keras.layers.Dropout(1., name='Dropout')(relu1)
softmax2 = keras.layers.Dense(10, activation='softmax', name='Softmax2')(dropout)
model = keras.models.Model(inputs=input0, outputs=softmax2, name='cifar')

just to test whether dropout is working..

I set dropout rate to be 1.0

the state in each epoch should be freezed without any tuning to parameters

however the accuracy keep growing although i drop all the hidden nodes enter image description here enter image description here

what's wrong?

like image 907
Daniel H. Leung Avatar asked Jan 20 '18 11:01

Daniel H. Leung


People also ask

What is dropout in keras?

tf.keras.layers.Dropout(rate, noise_shape=None, seed=None, **kwargs) Applies Dropout to the input. The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/ (1 - rate) such that the sum over all inputs is unchanged.

What is a dropout layer in machine learning?

The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/ (1 - rate) such that the sum over all inputs is unchanged. Note that the Dropout layer only applies when training is set to True such that no values are dropped ...

What is the dropout rate of dropout in DLP?

Dropout can be applied to input neurons called the visible layer. In the example below we add a new Dropout layer between the input (or visible layer) and the first hidden layer. The dropout rate is set to 20%, meaning one in 5 inputs will be randomly excluded from each update cycle.

What is the dropout rate for hidden layers?

The dropout rate is set to 20%, meaning one in 5 inputs will be randomly excluded from each update cycle. Additionally, as recommended in the original paper on Dropout, a constraint is imposed on the weights for each hidden layer, ensuring that the maximum norm of the weights does not exceed a value of 3.


1 Answers

Nice catch!

It would seem that the issue linked in the comment above by Dennis Soemers, Keras Dropout layer changes results with dropout=0.0, has not been fully resolved, and it somehow blunders when faced with a dropout rate of 1.0 [see UPDATE at the end of post]; modifying the model shown in the Keras MNIST MLP example:

model = Sequential()
model.add(Dense(512, activation='relu', use_bias=False, input_shape=(784,)))
model.add(Dropout(1.0))
model.add(Dense(512, activation='relu'))
model.add(Dropout(1.0))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss='categorical_crossentropy',
          optimizer=RMSprop(),
          metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=128,
          epochs=3,
          verbose=1,
          validation_data=(x_test, y_test))

gives indeed a model being trained, despite all neurons being dropped, as you report:

Train on 60000 samples, validate on 10000 samples
Epoch 1/3
60000/60000 [==============================] - 15s 251us/step - loss: 0.2180 - acc: 0.9324 - val_loss: 0.1072 - val_acc: 0.9654
Epoch 2/3
60000/60000 [==============================] - 15s 246us/step - loss: 0.0831 - acc: 0.9743 - val_loss: 0.0719 - val_acc: 0.9788
Epoch 3/3
60000/60000 [==============================] - 15s 245us/step - loss: 0.0526 - acc: 0.9837 - val_loss: 0.0997 - val_acc: 0.9723

Nevertheless, if you try a dropout rate of 0.99, i.e. replacing the two dropout layers in the above model with

model.add(Dropout(0.99))

then indeed you have effectively no training taking place, as it should be the case:

Train on 60000 samples, validate on 10000 samples
Epoch 1/3
60000/60000 [==============================] - 16s 265us/step - loss: 3.4344 - acc: 0.1064 - val_loss: 2.3008 - val_acc: 0.1136
Epoch 2/3
60000/60000 [==============================] - 16s 261us/step - loss: 2.3342 - acc: 0.1112 - val_loss: 2.3010 - val_acc: 0.1135
Epoch 3/3
60000/60000 [==============================] - 16s 266us/step - loss: 2.3167 - acc: 0.1122 - val_loss: 2.3010 - val_acc: 0.1135

UPDATE (after comment by Yu-Yang in OP): It seems as a design choice (deal link now, see update below) not to do anything when the dropout rate is equal to either 0 or 1; the Dropout class becomes effective only

if 0. < self.rate < 1.

Nevertheless, as already commented, a warning message in such cases (and a relevant note in the documentation) would arguably be a good idea.

UPDATE (July 2021): There have been some changes since Jan 2018 when the answer was written; now, under the hood, Keras calls tf.nn.dropout, which does not seem to allow for dropout=1 (source).

like image 107
desertnaut Avatar answered Oct 12 '22 14:10

desertnaut