Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ReLU Function In A Recurrent Neural Network. Weights Become Infinity or Zero

I am new to machine learning. I've read that the ReLU function is better than a sigmoid function for a recurrent neural network because of the vanishing gradient problem.

I'm trying to implement a very basic recurrent neural network with 3 input nodes, 10 hidden nodes and 3 output nodes.

There is the ReLU function at both input and hidden nodes and the softmax function for the output nodes.

However when I'm using the ReLU function after few epochs (less than 10) either the error gets to 0 or the error gets to infinity depending on whether the weight changes are added or subtracted from the original weights.

weight = weight + gradient_decent #weights hits infinity
weight = weight - gradient decent #weights become 0

And also because it hits infinity it gives the following error,

RuntimeWarning: invalid value encountered in maximum
  return np.maximum(x, 0)

However when I implement the sigmoid function the error nicely comes down. But because this is a simple example that is fine but if I use it on a bigger problem I am afraid I will hit with the vanishing gradient problem.

Is this caused by the small number of hidden nodes, how can I solve this issue? If you need the code sample please comment, not posting the code because it's too long.

Thank you.

like image 673
rksh Avatar asked Sep 07 '25 18:09

rksh


1 Answers

I do not think the number of hidden nodes is the problem.

In the first case the weights are approaching infinity since the gradient descent update is wrong. The gradient of the loss with respect to a weight represents the direction in which you should update the weight in order to increase the loss. Since one (usually) wants to minimize it, if the weights are updated in the positive direction, they will increase the loss and very likely leading to divergence.

Despite of that, even if assuming the update is right, I would see it more as wrong initialization/hyperparameter setting rather than a strictly ReLU dependent problem (obviously ReLU explodes in its positive part, giving infinity, while sigmoid saturates, giving 1).

In the second case instead, what is happening is the dead ReLU problem, a saturated ReLU that gives always the same (zero) output and it is not able to recover itself. It could happen for many reasons (i.e. bad initialization, wrong bias learning) but the most probable is a too high update step. Try to decrease your learning rate and see what happens.

In case this does not solve the problem, think to use the Leaky ReLU version, also just for simply debugging purposes.

More (and better explained) details about the Leaky ReLU and the dead ReLU can be found here: https://datascience.stackexchange.com/questions/5706/what-is-the-dying-relu-problem-in-neural-networks

like image 171
Lemm Ras Avatar answered Sep 10 '25 08:09

Lemm Ras