There is an Activation layer in Keras. Seems this code: <pre class="prettyprint"><code> model.add(Convolution2D(64, 3, 3)) model.add(Activation('relu')) </code></pre> and this one: <pre class="prettyprint"><code> model.add(Convolution2D(64, 3, 3, activation='relu')) </code></pre> produces the same result. What is the purpose of this additional Activation layer? [Upgr: 2017-04-10] Is there a difference in performance with above two scenarios?

As you may see, both approaches are equivalent. I will show you a few scenarios in which having this layer might help: <ol> <li> Same layer - different activations- one may easily imagine a net where you want to have different activations applied to the same output. Without <code>Activation</code> it's impossible.</li> <li> Need for output before activation - e.g. in siamese networks you are training your network using <code>softmax</code> as final activation - but in the end - you want to have so called <code>logits</code> - inverse <code>softmax</code> function. Without additional <code>Activation</code> layer that could be difficult.</li> <li> Saliency maps: in saliency maps - similiar to what you have in a previous point - you also need output before activation in order to compute a gradient w.r.t. to it - without <code>Activation</code> it wouldn't be possible.</li> </ol> As you may see - lack of <code>Activation</code> would make output of a layer before activation and final activation strongly coupled. That's why <code>Activation</code> might be pretty useful - as it breaks this ties.

keras usage of the Activation layer instead of activation parameter

Tags:

python

machine-learning

neural-network

keras

There is an Activation layer in Keras.

Seems this code:

  model.add(Convolution2D(64, 3, 3))
  model.add(Activation('relu'))

and this one:

  model.add(Convolution2D(64, 3, 3, activation='relu'))

produces the same result.

What is the purpose of this additional Activation layer?

[Upgr: 2017-04-10] Is there a difference in performance with above two scenarios?

749

asked Apr 06 '17 22:04

Leonid Ganeline

1 Answers

As you may see, both approaches are equivalent. I will show you a few scenarios in which having this layer might help:

Same layer - different activations- one may easily imagine a net where you want to have different activations applied to the same output. Without Activation it's impossible.
Need for output before activation - e.g. in siamese networks you are training your network using softmax as final activation - but in the end - you want to have so called logits - inverse softmax function. Without additional Activation layer that could be difficult.
Saliency maps: in saliency maps - similiar to what you have in a previous point - you also need output before activation in order to compute a gradient w.r.t. to it - without Activation it wouldn't be possible.

As you may see - lack of Activation would make output of a layer before activation and final activation strongly coupled. That's why Activation might be pretty useful - as it breaks this ties.

164

answered Oct 25 '22 22:10

Marcin Możejko

Related questions
                            
                                Use definition order of Enum as natural order
                            
                                Error on ansible playbook: the python mysqldb module is required
                            
                                Tensorflow: Replacement for tf.nn.rnn_cell._linear(input, size, 0, scope)
                            
                                Subset pandas dataframe using values from two columns
                            
                                XGBoost plot importance has no property max_num_features
                            
                                return item with maximum sort-key in dynamodb
                            
                                Display Django form fields on the "same line"
                            
                                Python-PPTX: Changing table style or adding borders to cells
                            
                                Pandas group hourly data into daily sums with date index
                            
                                Python Pillow - ValueError: Decompressed Data Too Large
                            
                                Access superclass' property setter in subclass
                            
                                Grandchild inheriting from Parent class - Python
                            
                                How to sort edges in networkx based on their weight
                            
                                Django JWT auth: How to get user data?
                            
                                How to delete numpy nan from a list of strings in Python?
                            
                                Scrapy: How to output items in a specific json format
                            
                                spaCy needs a file that is not there: strings.json
                            
                                Index of multiple minimum elements in a list [duplicate]
                            
                                Why do we still need parser like BeautifulSoup if we can use Selenium?
                            
                                flask-admin: How to make columns read-only according to other columns' value?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With