I (think) that I grasp the basics of DropOut and the use of the TensorFlow API in implementing it. But the normalization that's linked to the dropout probability in tf.nn.dropout
seems not to be a part of DropConnect. Is that correct? If so, does normalizing do any "harm" or can I simply apply tf.nn.dropout
to my weights to implement DropConnect?
Yes, you can use tf.nn.dropout to do DropConnect, just use tf.nn.dropout to wrap your weight matrix instead of your post matrix multiplication. You can then undo the weight change by multiplying by the dropout like so
dropConnect = tf.nn.dropout( m1, keep_prob ) * keep_prob
Here is a code example that calculates the XOR function using drop connect. I've also commented out the code that does dropout that you can sub in and compare the output.
### imports
import tensorflow as tf
### constant data
x = [[0.,0.],[1.,1.],[1.,0.],[0.,1.]]
y_ = [[1.,0.],[1.,0.],[0.,1.],[0.,1.]]
### induction
# Layer 0 = the x2 inputs
x0 = tf.constant( x , dtype=tf.float32 )
y0 = tf.constant( y_ , dtype=tf.float32 )
keep_prob = tf.placeholder( dtype=tf.float32 )
# Layer 1 = the 2x12 hidden sigmoid
m1 = tf.Variable( tf.random_uniform( [2,12] , minval=0.1 , maxval=0.9 , dtype=tf.float32 ))
b1 = tf.Variable( tf.random_uniform( [12] , minval=0.1 , maxval=0.9 , dtype=tf.float32 ))
########## DROP CONNECT
# - use this to preform "DropConnect" flavor of dropout
dropConnect = tf.nn.dropout( m1, keep_prob ) * keep_prob
h1 = tf.sigmoid( tf.matmul( x0, dropConnect ) + b1 )
########## DROP OUT
# - uncomment this to use "regular" dropout
#h1 = tf.nn.dropout( tf.sigmoid( tf.matmul( x0,m1 ) + b1 ) , keep_prob )
# Layer 2 = the 12x2 softmax output
m2 = tf.Variable( tf.random_uniform( [12,2] , minval=0.1 , maxval=0.9 , dtype=tf.float32 ))
b2 = tf.Variable( tf.random_uniform( [2] , minval=0.1 , maxval=0.9 , dtype=tf.float32 ))
y_out = tf.nn.softmax( tf.matmul( h1,m2 ) + b2 )
# loss : sum of the squares of y0 - y_out
loss = tf.reduce_sum( tf.square( y0 - y_out ) )
# training step : discovered learning rate of 1e-2 through experimentation
train = tf.train.AdamOptimizer(1e-2).minimize(loss)
### training
# run 5000 times using all the X and Y
# print out the loss and any other interesting info
with tf.Session() as sess:
sess.run( tf.initialize_all_variables() )
print "\nloss"
for step in range(5000) :
sess.run(train,feed_dict={keep_prob:0.5})
if (step + 1) % 100 == 0 :
print sess.run(loss,feed_dict={keep_prob:1.})
results = sess.run([m1,b1,m2,b2,y_out,loss],feed_dict={keep_prob:1.})
labels = "m1,b1,m2,b2,y_out,loss".split(",")
for label,result in zip(*(labels,results)) :
print ""
print label
print result
print ""
Both flavors are able to correctly separate the input into the correct output
y_out
[[ 7.05891490e-01 2.94108540e-01]
[ 9.99605477e-01 3.94574134e-04]
[ 4.99370173e-02 9.50062990e-01]
[ 4.39682379e-02 9.56031740e-01]]
Here you can see the output from dropConnect was able to correctly classify Y as true,true,false,false.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With