Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Details about alpha in tf.nn.leaky_relu( features, alpha=0.2, name=None )

I am trying to use leaky_relu as my activation function for hidden layers. For parameter alpha, it is explained as:

slope of the activation function at x < 0

What does this means? What effect will the different values of alpha have on the results of the model?

like image 592
Anthony0202 Avatar asked Nov 16 '25 12:11

Anthony0202


1 Answers

A deep explanation regarding ReLU and its variant is present in the following links:

  1. https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/
  2. https://medium.com/@himanshuxd/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e

In regular ReLU the main drawback is the fact that the input for the activation can be negative, due to operation performed in the network causing to what is referred as "Dying RELU" problem

the gradient is 0 whenever the unit is not active. This could lead to cases where a unit never activates as a gradient-based optimization algorithm will not adjust the weights of a unit that never activates initially. Further, like the vanishing gradients problem, we might expect learning to be slow when training ReLU networks with constant 0 gradients.

So Leaky ReLU substitutes zero values with some small value say 0.001 (referred as “alpha”). So, for leaky ReLU, the function f(x) = max(0.001x, x). Now gradient descent of 0.001x will be having a non-zero value and it will continue learning without reaching dead end.

like image 143
David Avatar answered Nov 18 '25 06:11

David