Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is ReLU used in regression with Neural Networks?

I am following the official TensorFlow with Keras tutorial and I got stuck here: Predict house prices: regression - Create the model

Why is an activation function used for a task where a continuous value is predicted?

The code is:

def build_model():
    model = keras.Sequential([
        keras.layers.Dense(64, activation=tf.nn.relu, 
                   input_shape=(train_data.shape[1],)),
        keras.layers.Dense(64, activation=tf.nn.relu),
        keras.layers.Dense(1)
    ])

    optimizer = tf.train.RMSPropOptimizer(0.001)

    model.compile(loss='mse', optimizer=optimizer, metrics=['mae'])
    return model
like image 768
Popovici Andrei-Sorin Avatar asked Jul 20 '18 12:07

Popovici Andrei-Sorin


People also ask

Why do we use ReLU in neural network?

ReLU stands for Rectified Linear Unit. The main advantage of using the ReLU function over other activation functions is that it does not activate all the neurons at the same time.

Which activation function is used for regression in neural network?

In a regression problem, we use the linear (identity) activation function with one node. In a binary classifier, we use the sigmoid activation function with one node. In a multiclass classification problem, we use the softmax activation function with one node per class.

Why ReLU is better than sigmoid for neural networks?

The model trained with ReLU converged quickly and thus takes much less time when compared to models trained on the Sigmoid function. We can clearly see overfitting in the model trained with ReLU. This is due to the quick convergence. The model performance is significantly better when trained with ReLU.

Why is ReLU so effective?

In fact, this is what ReLU's advantage is: it can bend the linear function at a certain point, to a certain degree. Combined with the biases and weights from the previous layer, the ReLU can take the form of a bend at any location at any degree.


1 Answers

The general reason for using non-linear activation functions in hidden layers is that, without them, no matter how many layers or how many units per layer, the network would behave just like a simple linear unit. This is nicely explained in this short video by Andrew Ng: Why do you need non-linear activation functions?

In your case, looking more closely, you'll see that the activation function of your final layer is not the relu as in your hidden layers, but the linear one (which is the default activation when you don't specify anything, like here):

keras.layers.Dense(1)

From the Keras docs:

Dense

[...]

Arguments

[...]

activation: Activation function to use (see activations). If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).

which is indeed what is expected for a regression network with a single continuous output.

like image 127
desertnaut Avatar answered Oct 12 '22 09:10

desertnaut