We know that we can use tf.stop_gradient(B)
to prevent variable B
being trained in backpropagation. But I have no idea how to stop B
in certain loss.
To put is simple, assume our loss is:
loss = categorical_crossentropy + my_loss
B = tf.stop_gradient(B)
where both categorical_crossentropy
and my_loss
all depends on B
. So, if we set stop gradient for B
, both of them will take B
as constant.
But how do I set only my_loss
stop gradient w.r.t B
, leave categorical_crossentropy
unchanged? Something like B = tf.stop_gradient(B, myloss)
My code for that would be:
my_loss = ...
B = tf.stop_gradient(B)
categorical_crossentropy = ...
loss = categorical_crossentropy + my_loss
Will that work? Or, how to make that work?
Okay, guys, if Q1 can be solved, my final quest is how to do that in custom layer?
To put it specific, assume we have a custom layer, which have trainable weights A
and B
and self loss my_loss
for this layer only.
class My_Layer(keras.layers.Layer):
def __init__(self, **kwargs):
super(My_Layer, self).__init__(**kwargs)
def build(self, input_shape):
self.w = self.add_weight(name='w', trainable=True)
self.B = self.add_weight(name='B', trainable=True)
my_loss = w * B
# tf.stop_gradient(w)
self.add_loss(my_loss)
How do I make w
only trainable for model loss (MSE, crossentropy etc.), and B
only trainable for my_loss
?
If I add that tf.stop_gradient(w)
, will that stop w
for my_loss
only or the final loss of the model?
Question 1
When you run y = tf.stop_gradient(x)
, you create a StopGradient
operation whose input is x
and output is y
. This operation behaves like an identity, i.e. the value of x
is the same as the value of y
except that gradients don't flow from y
to x
.
If you want to have gradients flow to B
only from some losses, you can simply do:
B_no_grad = tf.stop_gradient(B)
loss1 = get_loss(B) # B will be updated because of loss1
loss2 = get_loss(B_no_grad) # B will not be updated because of loss2
Things should become clear when you think about the computation graph you are building. stop_gradient
allows you to create an "identity" node for any tensor (not just variable) that does not allow gradients to flow through it.
Question 2
I don't know how to do this while using a model loss that you specify using a string (e.g. model.compile(loss='categorical_crossentropy', ...)
because you don't control its construction. However, you can do it by adding losses using add_loss
or building a model-level loss yourself using model outputs. For the former, just create some losses using plain variables and some using *_no_grad
versions, add them all using add_loss()
, and compile your model with loss=None
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With