I haven't been able to find a clear statement of whether tensorflow uses automatic or symbolic differentiation.
I skimmed the tensorflow paper and they mention automatic gradients, but it is unclear if they just mean symbolic gradients, as they also mention that it has that capability.
TensorFlow provides the tf. GradientTape API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually tf. Variable s.
The gradients are the partial derivatives of the loss with respect to each of the six variables. TensorFlow presents the gradient and the variable of which it is the gradient, as members of a tuple inside a list. We display the shapes of each of the gradients and variables to check that is actually the case.
Tensorflow calculates derivatives using automatic differentiation. This is different from symbolic differentiation and numeric differentiation (aka finite differences). More than a smart math approach, it is a smart programming approach.
TF uses automatic differentiation and more specifically reverse-mode auto differentiation.
There are 3 popular methods to calculate the derivative:
Numerical differentiation relies on the definition of the derivative: , where you put a very small
h
and evaluate function in two places. This is the most basic formula and on practice people use other formulas which give smaller estimation error. This way of calculating a derivative is suitable mostly if you do not know your function and can only sample it. Also it requires a lot of computation for a high-dim function.
Symbolic differentiation manipulates mathematical expressions. If you ever used matlab or mathematica, then you saw something like this
Here for every math expression they know the derivative and use various rules (product rule, chain rule) to calculate the resulting derivative. Then they simplify the end expression to obtain the resulting expression.
Automatic differentiation manipulates blocks of computer programs. A differentiator has the rules for taking the derivative of each element of a program (when you define any op in core TF, you need to register a gradient for this op). It also uses chain rule to break complex expressions into simpler ones. Here is a good example how it works in real TF programs with some explanation.
You might think that Automatic differentiation is the same as Symbolic differentiation (in one place they operate on math expression, in another on computer programs). And yes, they are sometimes very similar. But for control flow statements (`if, while, loops) the results can be very different:
symbolic differentiation leads to inefficient code (unless carefully done) and faces the difficulty of converting a computer program into a single expression
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With