I was just wondering if there is any significant difference between the use and speciality of
Dense(activation='relu')
and
keras.layers.ReLu
How and where the later one can be used? My best guess is in Functional API usecase but I don't know how.
Creating some Layer
instance passing the activation as parameter i.e. activation='relu'
is the same as creating some Layer
instance and then creating an activation e.g. Relu
instance. Relu()
is a layer which returns K.relu()
function over inputs
:
class ReLU(Layer):
.
.
.
def call(self, inputs):
return K.relu(inputs,
alpha=self.negative_slope,
max_value=self.max_value,
threshold=self.threshold)
From the Keras documentation:
Usage of activations
Activations can either be used through an Activation layer, or through the activation argument supported by all forward layers:
from keras.layers import Activation, Dense model.add(Dense(64)) model.add(Activation('tanh'))
This is equivalent to:
model.add(Dense(64, activation='tanh'))
You can also pass an element-wise TensorFlow/Theano/CNTK function as an activation:
from keras import backend as K model.add(Dense(64, activation=K.tanh))
Update:
Answering OP's aditional question: How and where the later one can be used?:
You can use it when you used some layer, which doesn't accept activation
parameter like e.g. tf.keras.layers.Add
, tf.keras.layers.Subtract
etc, but you want to get a rectified output of such layers as a result:
added = tf.keras.layers.Add()([x1, x2])
relu = tf.keras.layers.ReLU(added)
The most obvious use case is when you need to put a ReLU without a Dense
layer, for example when implementing ResNet, the design requires a ReLU activation after summing the residual connection, like it is shown here:
x = layers.add([x, shortcut])
x = layers.Activation('relu')(x)
return x
It is also useful when you want to put a BatchNormalization
layer between the pre-activation of a Dense
layer and the ReLU activation. When using a GlobalAveragePooling
classifier (such as in the SqueezeNet architecture), then you need to put a softmax activation after the GAP using Activation("softmax")
and there are no Dense
layers in the network.
There are probably more cases, these are just samples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With