I was wondering what was the difference between Activation Layer and Dense layer in Keras.
Since Activation Layer seems to be a fully connected layer, and Dense have a parameter to pass an activation function, what is the best practice ?
Let's imagine a fictionnal network like this : Input -> Dense -> Dropout -> Final Layer Final Layer should be : Dense(activation=softmax) or Activation(softmax) ? What is the cleanest and why ?
Thanks everyone!
Dense layer is the regular deeply connected neural network layer. It is most common and frequently used layer. Dense layer does the below operation on the input and return the output. output = activation(dot(input, kernel) + bias)
keras. activations. elu function to ensure a slope larger than one for positive inputs. The values of alpha and scale are chosen so that the mean and variance of the inputs are preserved between two consecutive layers as long as the weights are initialized correctly (see tf.
Activation functions are a critical part of the design of a neural network. The choice of activation function in the hidden layer will control how well the network model learns the training dataset. The choice of activation function in the output layer will define the type of predictions the model can make.
Yes, it is the same. model. add (Dense(10, activation = None)) or nn. linear(128, 10) is the same, because it is not activated in both, therefore if you don't specify anything, no activation is applied.
Using Dense(activation=softmax)
is computationally equivalent to first add Dense
and then add Activation(softmax)
. However there is one advantage of the second approach - you could retrieve the outputs of the last layer (before activation) out of such defined model. In the first approach - it's impossible.
As @MarcinMożejko said, it is equivalent. I just want to explain why. If you look at the Dense
Keras documentation page, you'll see that the default activation function is None
.
A dense layer mathematically is:
a = g(W.T*a_prev+b)
where g
an activation function. When using Dense(units=k, activation=softmax)
, it is computing all the quantities in one shot. When doing Dense(units=k)
and then Activation('softmax), it first calculates the quantity, W.T*a_prev+b
(because the default activation function is None
) and then applying the activation function specified as input to the Activation
layer to the calculated quantity.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With