Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does theano implement computing every function's gradient?

I have a question about Theano's implementation. How the theano get the gradient of every loss function by the following function(T.grad)? Thank you for your help.

 gparams = T.grad(cost, self.params) 
like image 969
Issac Avatar asked Dec 20 '22 07:12

Issac


2 Answers

Edit: this answer was wrong in saying that Theano uses Symbolic Differentiation. My apologies.

Theano implements reverse mode autodiff, but confusingly they call it "symbolic differentiation". This is misleading because symbolic differentiation is something quite different. Let's look at both.

Symbolic differentiation: given a graph representing a function f(x), it uses the chain rule to compute a new graph representing the derivative of that function f'(x). They call this "compiling" f(x). One problem with symbolic differentiation is that it can output a very inefficient graph, but Theano automatically simplifies the output graph.

Example:

"""
f(x) = x*x + x - 2
Graph =
          ADD
         /   \
        MUL  SUB
       /  \  /  \
       x  x  x  2

Chain rule for ADD=> (a(x)+b(x))' = a'(x) + b'(x)
Chain rule for MUL=> (a(x)*b(x))' = a'(x)*b(x) + a(x)*b'(x)
Chain rule for SUB=> (a(x)-b(x))' = a'(x) - b'(x)
The derivative of x is 1, and the derivative of a constant is 0.

Derivative graph (not optimized yet) =
          ADD
         /   \
       ADD    SUB
      /  |    |  \
   MUL  MUL   1   0
  /  |  |  \
 1   x  x   1

Derivative graph (after optimization) =
          ADD
         /   \
       MUL    1
      /   \
     2     x

So: f'(x) = 2*x + 1
"""

Reverse mode autodiff: works in two passes through the computation graph, first going forward through the graph (from the inputs to the outputs), and then backwards using the chain rule (if you are familiar with backpropagation, this is exactly how it computes gradients).

See this great post for more details on various automatic differentiation solutions and their pros&cons.

like image 176
MiniQuark Avatar answered Dec 22 '22 00:12

MiniQuark


Look up Automatic differentiation and there the backwards mode that is used to efficiently evaluate gradients.

Theano is, as far as I can see, a hybrid between the code-rewriting and operator based approach. It uses operator overloading in python to construct the computational graph, then optimizes it and generates from that graph (optimized) sequences of operations to evaluate the required inkds of derivatives.

like image 39
Lutz Lehmann Avatar answered Dec 22 '22 01:12

Lutz Lehmann