Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does almost every Activation Function Saturate at Negative Input Values in a Neural Network

This may be a very basic/trivial question.

For Negative Inputs,

  1. Output of ReLu Activation Function is Zero
  2. Output of Sigmoid Activation Function is Zero
  3. Output of Tanh Activation Function is -1

Below Mentioned are my questions:

  1. Why is it that all of the above Activation Functions Saturated for Negative Input Values.
  2. Is there any Activation Function if we want to predict a Negative Target Value.

Thank you.

like image 796
RakTheGeek Avatar asked Feb 27 '20 15:02

RakTheGeek


2 Answers

  1. True - ReLU is designed to result in zero for negative values. (It can be dangerous with big learning rates, bad initialization or with very few units - all neurons can get stuck in zero and the model freezes)

  2. False - Sigmoid results in zero for "very negative" inputs, not for "negative" inputs. If your inputs are between -3 and +3, you will see a very pleasant result between 0 and 1.

  3. False - The same comment as Sigmoid. If your inputs are between -2 and 2, you will see nice results between -1 and 1.


So, the saturation problem only exists for inputs whose absolute values are too big.

By definition, the outputs are:

  • ReLU: 0 < y < inf (with center in 0)
  • Sigmoid: 0 < y < 1 (with center in 0.5)
  • TanH: -1 < y < 1 (with center in 0)

You might want to use a BatchNormalization layer before these activations to avoid having big values and avoid saturation.


For predicting negative outputs, tanh is the only of the three that is capable of doing that.

You could invent a negative sigmoid, though, it's pretty easy:

def neg_sigmoid(x):
    return -keras.backend.sigmoid(x)

#use the layer:
Activation(neg_sigmoid)
like image 117
Daniel Möller Avatar answered Oct 17 '22 11:10

Daniel Möller


In short, negative/positive doesn't matter for these activation functions.

  1. Sigmoid and tanh is both saturated for positive and negative values. As stated in the comments, they are symmetrical to input 0. For relu, it does only saturate for negative values, but I'll explain why it doens't matter in the next question.

  2. The answer is an activation function doesn't need to 'predict' a negative value. The point of the activation function is not to give an equation to predict your final value, but to give a non-linearity to your neural network in the middle layers. You then use some appropriate function at the last layer to get the wanted output values. ex) softmax for classification, just linear for regression.
    So because these activation functions are in the middle, it really doesn't matter if the activation function only outputs positive values even if your 'wanted' values are negative, since the model will make the weights for the next layes negative.(hence the term 'wanted values are negative' doesn't mean anything)

So, Relu being saturated on the negative side is no different from it being saturated on the positive side. There are activation functions that doesn't saturated such as leaky Relu, so you may want to check it out. But the point positive/negative for activation functions doesn't matter.

like image 44
ddoGas Avatar answered Oct 17 '22 12:10

ddoGas