Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

keras usage of the Activation layer instead of activation parameter

There is an Activation layer in Keras.

Seems this code:

  model.add(Convolution2D(64, 3, 3))
  model.add(Activation('relu'))

and this one:

  model.add(Convolution2D(64, 3, 3, activation='relu'))

produces the same result.

What is the purpose of this additional Activation layer?

[Upgr: 2017-04-10] Is there a difference in performance with above two scenarios?

like image 749
Leonid Ganeline Avatar asked Apr 06 '17 22:04

Leonid Ganeline


People also ask

What is activation in keras layers?

keras. activations. elu function to ensure a slope larger than one for positive inputs. The values of alpha and scale are chosen so that the mean and variance of the inputs are preserved between two consecutive layers as long as the weights are initialized correctly (see tf.

What is activation =' ReLU in keras?

Relu activation function in keras and why is it used The Rectified Linear Unit is the most commonly used activation function in deep learning models. The function returns 0 if it receives any negative input, but for any positive value x it returns that value back.

What is the default activation in keras?

Default: hyperbolic tangent ( tanh ). If you pass None , no activation is applied (ie. "linear" activation: a(x) = x ).

What is the best activation function in neural networks?

ReLU (Rectified Linear Unit) Activation Function The ReLU is the most used activation function in the world right now. Since, it is used in almost all the convolutional neural networks or deep learning.


1 Answers

As you may see, both approaches are equivalent. I will show you a few scenarios in which having this layer might help:

  1. Same layer - different activations- one may easily imagine a net where you want to have different activations applied to the same output. Without Activation it's impossible.
  2. Need for output before activation - e.g. in siamese networks you are training your network using softmax as final activation - but in the end - you want to have so called logits - inverse softmax function. Without additional Activation layer that could be difficult.
  3. Saliency maps: in saliency maps - similiar to what you have in a previous point - you also need output before activation in order to compute a gradient w.r.t. to it - without Activation it wouldn't be possible.

As you may see - lack of Activation would make output of a layer before activation and final activation strongly coupled. That's why Activation might be pretty useful - as it breaks this ties.

like image 164
Marcin Możejko Avatar answered Oct 25 '22 22:10

Marcin Możejko