Problem: a very long RNN net
N1 -- N2 -- ... --- N100
For a Optimizer like AdamOptimizer
, the compute_gradient()
will give gradients to all training variables.
However, it might explode during some step.
A method like in how-to-effectively-apply-gradient-clipping-in-tensor-flow can clip large final gradient.
But how to clip those intermediate ones?
One way might be manually do the backprop from "N100 --> N99", clip the gradients, then "N99 --> N98" and so on, but that's just too complicated.
So my question is: Is there any easier method to clip the intermediate gradients? (of course, strictly speaking, they are not gradients anymore in the mathematical sense)
@tf.custom_gradient
def gradient_clipping(x):
return x, lambda dy: tf.clip_by_norm(dy, 10.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With