Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to monitor gradient vanish and explosion in keras with tensorboard?

I would like to monitor the gradient changes in tensorboard with keras to decide whether gradient vanish or explosion. What should I do?

like image 219
Joey Chia Avatar asked Apr 26 '18 00:04

Joey Chia


People also ask

How do you monitor gradients?

To check for vanishing / exploding gradients, pay attention the gradients distribution and absolute values in the layer of interest ("Distributions" tab): If the distribution is highly peaked and concentrated around 0, the gradients are probably vanishing. Here's a concrete example how it looks like in practice.

How do we counter exploding gradient problems in recurrent neural network?

Gradient Clipping Another popular technique to mitigate the exploding gradients problem is to clip the gradients during backpropagation so that they never exceed some threshold. This is called Gradient Clipping. This optimizer will clip every component of the gradient vector to a value between –1.0 and 1.0.

How do you deal with gradient vanishing?

Method to overcome the problem The vanishing gradient problem is caused by the derivative of the activation function used to create the neural network. The simplest solution to the problem is to replace the activation function of the network. Instead of sigmoid, use an activation function such as ReLU.


1 Answers

To visualize the training in Tensorboard, add keras.callbacks.TensorBoard callback to model.fit function. Don't forget to set write_grads=True to see the gradients there. Right after training start, you can run...

tensorboard --logdir=/full_path_to_your_logs

... from the command line and point your browser to htttp://localhost:6006. See the example code in this question.

To check for vanishing / exploding gradients, pay attention the gradients distribution and absolute values in the layer of interest ("Distributions" tab):

  • If the distribution is highly peaked and concentrated around 0, the gradients are probably vanishing. Here's a concrete example how it looks like in practice.
  • If the distribution is rapidly growing in absolute value with time, the gradients are exploding. Often the output values at the same layer become NaNs very quickly as well.
like image 157
Maxim Avatar answered Sep 17 '22 18:09

Maxim