Lets say I have an NxM weight variable weights
and a constant NxM matrix of 1s and 0s mask
.
If a layer of my network is defined like this (with other layers similarly defined):
masked_weights = mask*weights
layer1 = tf.relu(tf.matmul(layer0, masked_weights) + biases1)
Will this network behave as if the corresponding 0s in mask
are zeros in weights
during training? (i.e. as if the connections represented by those weights had been removed from the network entirely)?
If not, how can I achieve this goal in TensorFlow?
The answer is yes. The experiment depicts the following graph.
The implementation is:
import numpy as np, scipy as sp, tensorflow as tf
x = tf.placeholder(tf.float32, shape=(None, 3))
weights = tf.get_variable("weights", [3, 2])
bias = tf.get_variable("bias", [2])
mask = tf.constant(np.asarray([[0, 1], [1, 0], [0, 1]], dtype=np.float32)) # constant mask
masked_weights = tf.multiply(weights, mask)
y = tf.nn.relu(tf.nn.bias_add(tf.matmul(x, masked_weights), bias))
loss = tf.losses.mean_squared_error(tf.constant(np.asarray([[1, 1]], dtype=np.float32)),y)
weights_grad = tf.gradients(loss, weights)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print("Masked weights=\n", sess.run(masked_weights))
data = np.random.rand(1, 3)
print("Graident of weights\n=", sess.run(weights_grad, feed_dict={x: data}))
sess.close()
After running the code above, you will see the gradients are masked as well. In my example, they are:
Graident of weights
= [array([[ 0. , -0.40866762],
[ 0.34265977, -0. ],
[ 0. , -0.35294518]], dtype=float32)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With