How does theano implement computing every function's gradient?

Question

I have a question about Theano's implementation. How the theano get the gradient of every loss function by the following function(T.grad)? Thank you for your help.

 gparams = T.grad(cost, self.params)

MiniQuark · Accepted Answer

Edit: this answer was wrong in saying that Theano uses Symbolic Differentiation. My apologies.

Theano implements reverse mode autodiff, but confusingly they call it "symbolic differentiation". This is misleading because symbolic differentiation is something quite different. Let's look at both.

Symbolic differentiation: given a graph representing a function f(x), it uses the chain rule to compute a new graph representing the derivative of that function f'(x). They call this "compiling" f(x). One problem with symbolic differentiation is that it can output a very inefficient graph, but Theano automatically simplifies the output graph.

Example:

"""
f(x) = x*x + x - 2
Graph =
          ADD
         /   \
        MUL  SUB
       /  \  /  \
       x  x  x  2

Chain rule for ADD=> (a(x)+b(x))' = a'(x) + b'(x)
Chain rule for MUL=> (a(x)*b(x))' = a'(x)*b(x) + a(x)*b'(x)
Chain rule for SUB=> (a(x)-b(x))' = a'(x) - b'(x)
The derivative of x is 1, and the derivative of a constant is 0.

Derivative graph (not optimized yet) =
          ADD
         /   \
       ADD    SUB
      /  |    |  \
   MUL  MUL   1   0
  /  |  |  \
 1   x  x   1

Derivative graph (after optimization) =
          ADD
         /   \
       MUL    1
      /   \
     2     x

So: f'(x) = 2*x + 1
"""

Reverse mode autodiff: works in two passes through the computation graph, first going forward through the graph (from the inputs to the outputs), and then backwards using the chain rule (if you are familiar with backpropagation, this is exactly how it computes gradients).

See this great post for more details on various automatic differentiation solutions and their pros&cons.

Lutz Lehmann · Answer

Look up Automatic differentiation and there the backwards mode that is used to efficiently evaluate gradients.

Theano is, as far as I can see, a hybrid between the code-rewriting and operator based approach. It uses operator overloading in python to construct the computational graph, then optimizes it and generates from that graph (optimized) sequences of operations to evaluate the required inkds of derivatives.

How does theano implement computing every function's gradient?

Tags:

python

math

gradient

theano

automatic-differentiation

Issac

2 Answers

MiniQuark

Lutz Lehmann

Recent Activity

Donate For Us

How does theano implement computing every function's gradient?

Tags:

python

math

gradient

theano

automatic-differentiation

Issac

2 Answers

MiniQuark

Lutz Lehmann

Related questions

Recent Activity

Donate For Us