I am following the official TensorFlow with Keras tutorial and I got stuck here: Predict house prices: regression - Create the model
Why is an activation function used for a task where a continuous value is predicted?
The code is:
def build_model():
model = keras.Sequential([
keras.layers.Dense(64, activation=tf.nn.relu,
input_shape=(train_data.shape[1],)),
keras.layers.Dense(64, activation=tf.nn.relu),
keras.layers.Dense(1)
])
optimizer = tf.train.RMSPropOptimizer(0.001)
model.compile(loss='mse', optimizer=optimizer, metrics=['mae'])
return model
ReLU stands for Rectified Linear Unit. The main advantage of using the ReLU function over other activation functions is that it does not activate all the neurons at the same time.
In a regression problem, we use the linear (identity) activation function with one node. In a binary classifier, we use the sigmoid activation function with one node. In a multiclass classification problem, we use the softmax activation function with one node per class.
The model trained with ReLU converged quickly and thus takes much less time when compared to models trained on the Sigmoid function. We can clearly see overfitting in the model trained with ReLU. This is due to the quick convergence. The model performance is significantly better when trained with ReLU.
In fact, this is what ReLU's advantage is: it can bend the linear function at a certain point, to a certain degree. Combined with the biases and weights from the previous layer, the ReLU can take the form of a bend at any location at any degree.
The general reason for using non-linear activation functions in hidden layers is that, without them, no matter how many layers or how many units per layer, the network would behave just like a simple linear unit. This is nicely explained in this short video by Andrew Ng: Why do you need non-linear activation functions?
In your case, looking more closely, you'll see that the activation function of your final layer is not the relu
as in your hidden layers, but the linear one (which is the default activation when you don't specify anything, like here):
keras.layers.Dense(1)
From the Keras docs:
Dense
[...]
Arguments
[...]
activation: Activation function to use (see activations). If you don't specify anything, no activation is applied (ie. "linear" activation:
a(x) = x
).
which is indeed what is expected for a regression network with a single continuous output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With