Understanding Regularization in Keras

Tags:

keras

I am trying to understand why regularization syntax in Keras looks the way that it does.

Roughly speaking, regularization is way to reduce overfitting by adding a penalty term to the loss function proportional to some function of the model weights. Therefore, I would expect that regularization would be defined as part of the specification of the model's loss function.

However, in Keras the regularization is defined on a per-layer basis. For instance, consider this regularized DNN model:

input = Input(name='the_input', shape=(None, input_shape))
x = Dense(units = 250, activation='tanh', name='dense_1', kernel_regularizer=l2, bias_regularizer=l2, activity_regularizer=l2)(x)
x = Dense(units = 28, name='dense_2',kernel_regularizer=l2, bias_regularizer=l2, activity_regularizer=l2)(x)
y_pred = Activation('softmax', name='softmax')(x)
mymodel= Model(inputs=input, outputs=y_pred)
mymodel.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

I would have expected that the regularization arguments in the Dense layer were not needed and I could just write the last line more like:

mymodel.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'], regularization='l2')

This is obviously wrong syntax, but I was hoping someone could elaborate for me a bit on why the regularizes are defined this way and what is actually happening when I use layer-level regularization.

The other thing I don't understand is under what circumstances would I use each or all of the three regularization options: (kernel_regularizer, activity_regularizer, bias_regularizer)?

471

asked Jun 01 '18 19:06

Sledge

1 Answers

Let's break down the components of your question:

Your expectation of regularisation is probably in line with a feed-forward network where yes the penalty term is applied to the weights of the overall network. But this is not necessarily the case when you have RNNs mixed with CNNs etc so Keras opts give fine grain control. Perhaps for easy setup, a regularisation at model level could be added to the API for all weights.
When you use layer regularisation, the base Layer class actually adds the regularising term to the loss which at training time penalises the corresponding layer's weights etc.
Now in Keras you can often apply regularisation to 3 different things as in Dense layer. Every layer has different kernels such recurrent etc, so for the question let's look at the ones you are interested in but the same roughly applies to all layers:
1. kernel: this applies to actual weights of the layer, in Dense it is the W of Wx+b.
2. bias: this is the bias vector of the weights, so you can apply a different regulariser for it, the b in Wx+b.
3. activity: is applied to the output vector, the y in y = f(Wx + b).

101

answered Sep 22 '22 19:09

nuric

Related questions
                            
                                Passing memoryview to C function
                            
                                How can I generate documentation for a Python property setter using Sphinx?
                            
                                NLTK - Counting Frequency of Bigram
                            
                                I get an error in python3 when importing mechanize
                            
                                Importing opencv and getting numpy.core.multiarray failed to import
                            
                                How to slice one MultiIndex DataFrame with the MultiIndex of another
                            
                                How should I set my DATABASE_URL?
                            
                                Cut a polygon with two lines in Shapely
                            
                                Pandas - 'Series' object has no attribute 'colNames' when using apply()
                            
                                How do you create a boolean mask for a tensor in Keras?
                            
                                How to run Keras on multiple cores?
                            
                                Session authentication with Django channels
                            
                                How to pass multiprocessing.Pool instance to apply_async callback function?
                            
                                How to add python console in spyder
                            
                                Django ALLOWED_HOSTS vs CORS(django-cors-headers)
                            
                                What is pycryptodomex and how does it differ from pycryptodome?
                            
                                -n and -r arguments to IPython's %timeit magic
                            
                                Unicode subscripts and superscripts in identifiers, why does Python consider XU == Xᵘ == Xᵤ?
                            
                                Module not found error in PyCharm , but it is installed as Anaconda package
                            
                                R Markdown: How can I make RStudio display Python plots inline instead of in new window?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With