How to freeze/lock weights of one TensorFlow variable (e.g., one CNN kernel of one layer)

Tags:

I have a TensorFlow CNN model that is performing well and we would like to implement this model in hardware; i.e., an FPGA. It's a relatively small network but it would be ideal if it were smaller. With that goal, I've examined the kernels and find that there are some where the weights are quite strong and there are others that aren't doing much at all (the kernel values are all close to zero). This occurs specifically in layer 2, corresponding to the tf.Variable() named, "W_conv2". W_conv2 has shape [3, 3, 32, 32]. I would like to freeze/lock the values of W_conv2[:, :, 29, 13] and set them to zero so that the rest of the network can be trained to compensate. Setting the values of this kernel to zero effectively removes/prunes the kernel from the hardware implementation thus achieving the goal stated above.

I have found similar questions with suggestions that generally revolve around one of two approaches;

Suggestion #1:

    tf.Variable(some_initial_value, trainable = False)

Implementing this suggestion freezes the entire variable. I want to freeze just a slice, specifically W_conv2[:, :, 29, 13].

Suggestion #2:

    Optimizer = tf.train.RMSPropOptimizer(0.001).minimize(loss, var_list)

Again, implementing this suggestion does not allow the use of slices. For instance, if I try the inverse of my stated goal (optimize only a single kernel of a single variable) as follows:

    Optimizer = tf.train.RMSPropOptimizer(0.001).minimize(loss, var_list = W_conv2[:,:,0,0]))

I get the following error:

    NotImplementedError: ('Trying to optimize unsupported type ', <tf.Tensor 'strided_slice_2228:0' shape=(3, 3) dtype=float32>)

Slicing tf.Variables() isn't possible in the way that I've tried it here. The only thing that I've tried which comes close to doing what I want is using .assign() but this is extremely inefficient, cumbersome, and caveman-like as I've implemented it as follows (after the model is trained):

    for _ in range(10000):
        # get a new batch of data
        # reset the values of W_conv2[:,:,29,13]=0 each time through
        for m in range(3):
            for n in range(3):
                assign_op = W_conv2[m,n,29,13].assign(0)
                sess.run(assign_op)
        # re-train the rest of the network
        _, loss_val = sess.run([optimizer, loss], feed_dict = {
                                   dict_stuff_here
                               })
        print(loss_val)

The model was started in Keras then moved to TensorFlow since Keras didn't seem to have a mechanism to achieve the desired results. I'm starting to think that TensorFlow doesn't allow for pruning but find this hard to believe; it just needs the correct implementation.

543

asked Feb 28 '17 20:02

JHarchanko

1 Answers

A possible approach is to initialize these specific weights with zeros, and modify the minimization process such that gradients won't be applied to them. It can be done by replacing the call to minimize() with something like:

W_conv2_weights = np.ones((3, 3, 32, 32))
W_conv2_weights[:, :, 29, 13] = 0
W_conv2_weights_const = tf.constant(W_conv2_weights)

optimizer = tf.train.RMSPropOptimizer(0.001)

W_conv2_orig_grads = tf.gradients(loss, W_conv2)
W_conv2_grads = tf.multiply(W_conv2_weights_const, W_conv2_orig_grads)
W_conv2_train_op = optimizer.apply_gradients(zip(W_conv2_grads, W_conv2))

rest_grads = tf.gradients(loss, rest_of_vars)
rest_train_op = optimizer.apply_gradients(zip(rest_grads, rest_of_vars))

tf.group([rest_train_op, W_conv2_train_op])

I.e,

Preparing a constant Tensor for canceling the appropriate gradients
Compute gradients only for W_conv2, then multiply element-wise with the constant W_conv2_weights to zero the appropriate gradients and only then apply gradients.
Compute and apply gradients "normally" to the rest of the variables.
Group the 2 train ops to a single training op.

180

answered Nov 29 '22 06:11

zohar.kom

Related questions
                            
                                Multi-output regression model always returns the same value for a batch in Tensorflow
                            
                                Tensorflow - How to use the GPU instead of a CPU for tf.Estimator() CNNs
                            
                                Import tensorflow error: terminate called after throwing an instance of 'Xbyak::Error'
                            
                                TensorFlow InternalError: Unable to get element as bytes
                            
                                Updating Tensorflow Object detection model with new images
                            
                                How does tf.layers.dense() interact with inputs of higher dim?
                            
                                How to use libtensorflow-lite.a on Raspi 3?
                            
                                Keras: update model with a bigger training set
                            
                                Google Colab Error : Failed to get convolution algorithm.This is probably because cuDNN failed to initialize
                            
                                Inference with TensorRT .engine file on python
                            
                                Input 0 of layer lstm_5 is incompatible with the layer: expected ndim=3, found ndim=2
                            
                                Tensorflow automl model in react
                            
                                How can I feed last output y(t-1) as input for generating y(t) in tensorflow RNN?
                            
                                Tensorflow 0.8 Import and Export output tensors problems
                            
                                Tensorflow: Convert Tensor to numpy array WITHOUT .eval() or sess.run()
                            
                                TensorFlow: Graph Optimization (GPU vs CPU Performance)
                            
                                Is there a way to clip intermediate exploded gradients in tensorflow
                            
                                Tensorflow LSTM character by character sequence prediction
                            
                                tensorflow difference between saving model via exporter and tf.train.write_graph()?
                            
                                SHA Hashing for training/validation/testing set split

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to freeze/lock weights of one TensorFlow variable (e.g., one CNN kernel of one layer)

Tags:

tensorflow

pruning

JHarchanko

People also ask

1 Answers

zohar.kom

Recent Activity

Donate For Us