How does keras(or any other ML framework) calculate the gradient of a lambda function layer for backpropagation?

1 Answers

That one of the benefit of using Theano/Tensorflow and libraries build on top of them. They can give you automatic gradient calculation of the mathematical functions and operations.

Keras gets them by calling:

# keras/theano_backend.py
def gradients(loss, variables):
    return T.grad(loss, variables)

# keras/tensorflow_backend.py
def gradients(loss, variables):
    '''Returns the gradients of `variables` (list of tensor variables)
    with regard to `loss`.
    '''
    return tf.gradients(loss, variables, colocate_gradients_with_ops=True)

which are in turn called by the optimizers(keras/optimizers.py) grads = self.get_gradients(loss, params) to get the gradients which are used to write the update rule for all the params. params here are the trainable weights of the layers. But layers created by the Lambda functional layers don't have any trainable weights. But they affect the loss function though the forward prob and hence indirectly affect the calculation of the gradients of trainable weights of other layers.

The only time you need to write new gradient calculation is when you are defining a new basic mathematical operation/function. Also, when you write a custom loss function the auto grad almost always takes care of the gradient calculation. But optionally you can optimize training (not always) if you implement analytical gradient of your custom functions. For example softwax function can be expressed in exp, sum and div and auto grad can take care of it, but its analytical/symbolic grad is usually implemented in Theano/Tensorflow.

For implementing new Ops you can see the below links for that: http://deeplearning.net/software/theano/extending/extending_theano.html https://www.tensorflow.org/versions/r0.12/how_tos/adding_an_op/index.html

135

answered Sep 21 '22 09:09

indraforyou

Related questions
                            
                                How to calculate feature importance in each models of cross validation in sklearn
                            
                                Using Hyper-parameters from H2O to re-build XGBoost in Sklearn gives Difference Performance in Python
                            
                                Improve real-life results of neural network trained with mnist dataset
                            
                                Count number of the blues lines on white background in the image
                            
                                How to reset Keras metrics?
                            
                                OpenAI GPT-2 model use with TensorFlow JS
                            
                                How to compute hessian matrix for all parameters in a network in pytorch?
                            
                                How to purposely overfit Weka tree classifiers?
                            
                                How to calculate tag-wise precision and recall for POS tagger?
                            
                                Which keywords most distinguish two groups of people?
                            
                                How to implement a soft-margin SVM model using Matlab's quadprog?
                            
                                How to choose the right kernel functions
                            
                                Gradient Descent: Do we iterate on ALL of the training set with each step in GD? or Do we change GD for each training set?
                            
                                How to classify URLs? what are URLs features? How to select and Extract features from URL
                            
                                Get a classification report stating the class wise precision and recall for multinomial Naive Bayes using 10 fold cross validation
                            
                                TensorFlow - why doesn't this sofmax regression learn anything?
                            
                                Python Neural Network Reinforcement Learning [closed]
                            
                                Why does support vectors in SVM have alpha (Lagrangian multiplier) greater than zero?
                            
                                Music21 Getting All notes with Durations
                            
                                Tensorflow: why is zip() function used in the steps involving applying the gradients?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does keras(or any other ML framework) calculate the gradient of a lambda function layer for backpropagation?

Tags:

machine-learning

keras

Isaac Dorfman

People also ask

1 Answers

indraforyou

Recent Activity

Donate For Us