Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow, Keras: How to set add_loss in Keras layer with stop gradient?

Question 1

We know that we can use tf.stop_gradient(B) to prevent variable B being trained in backpropagation. But I have no idea how to stop B in certain loss.

To put is simple, assume our loss is:

loss = categorical_crossentropy + my_loss
B = tf.stop_gradient(B)

where both categorical_crossentropy and my_loss all depends on B. So, if we set stop gradient for B, both of them will take B as constant.

But how do I set only my_loss stop gradient w.r.t B, leave categorical_crossentropy unchanged? Something like B = tf.stop_gradient(B, myloss)

My code for that would be:

my_loss = ...
B = tf.stop_gradient(B)
categorical_crossentropy = ...
loss = categorical_crossentropy + my_loss

Will that work? Or, how to make that work?


Question 2

Okay, guys, if Q1 can be solved, my final quest is how to do that in custom layer?

To put it specific, assume we have a custom layer, which have trainable weights A and B and self loss my_loss for this layer only.

class My_Layer(keras.layers.Layer):
    def __init__(self, **kwargs):
        super(My_Layer, self).__init__(**kwargs)
    def build(self, input_shape):
        self.w = self.add_weight(name='w', trainable=True)
        self.B = self.add_weight(name='B', trainable=True)
        my_loss = w * B
        # tf.stop_gradient(w)
        self.add_loss(my_loss)

How do I make w only trainable for model loss (MSE, crossentropy etc.), and B only trainable for my_loss?

If I add that tf.stop_gradient(w), will that stop w for my_loss only or the final loss of the model?

like image 807
Nathan Explosion Avatar asked Oct 17 '22 14:10

Nathan Explosion


1 Answers

Question 1

When you run y = tf.stop_gradient(x), you create a StopGradient operation whose input is x and output is y. This operation behaves like an identity, i.e. the value of x is the same as the value of y except that gradients don't flow from y to x.

If you want to have gradients flow to B only from some losses, you can simply do:

B_no_grad = tf.stop_gradient(B)
loss1 = get_loss(B)  # B will be updated because of loss1
loss2 = get_loss(B_no_grad)   # B will not be updated because of loss2 

Things should become clear when you think about the computation graph you are building. stop_gradient allows you to create an "identity" node for any tensor (not just variable) that does not allow gradients to flow through it.

Question 2

I don't know how to do this while using a model loss that you specify using a string (e.g. model.compile(loss='categorical_crossentropy', ...) because you don't control its construction. However, you can do it by adding losses using add_loss or building a model-level loss yourself using model outputs. For the former, just create some losses using plain variables and some using *_no_grad versions, add them all using add_loss(), and compile your model with loss=None.

like image 179
iga Avatar answered Nov 04 '22 20:11

iga