First: I am only a few days in with Tensorflow, so please bear with me.
I started out from the cifar10 tutorial code and I am now using a combination of convolutions and eigenvalue decompositions that break the symbolic differentiation. I.e. the graph gets built, then upon calling train()
the script halts with "No gradient defined for operation [...] (op type: SelfAdjointEig)". No surprise there.
The inputs to the subgraph in question are still only the input feature maps and the filters being used, and I have the formulas for the gradients at hand and they should be straight-forward to implement given the inputs to the subgraph and the gradient with respect to its output.
From what I can see in the docs, I can register a gradient method for custom Ops with RegisterGradient
or override them with the experimental gradient_override_map
.
Both of those should give me access to exactly the things I need. For example, searching on Github I find a lot of examples that access the op's inputs as op.input[0]
or such.
The problem I have is that I want to essentially "shortcut" a whole subgraph, not a single op, so I have no single op to decorate. Since this is happening in one of the convolutional layers of the cifar example I tried using the scope object for that layer. Conceptually, what enters and exits that scope's graph is exactly what I want so if I could somehow override the whole scope's gradients that would "already" do it.
I saw tf.Graph.create_op
which (I think) I could use to register a new type of operation and I could then override that Operation type's gradient computation with aforementioned methods. But I don't see a way of defining that op's forward pass without writing it in C++...
Maybe I am approaching this the wrong way entirely? Since all of my forward or backward operations can be implemented with the python interface I obviously want to avoid implementing anything in C++.
Here's a trick from Sergey Ioffe:
Suppose you want group of ops that behave as f(x) in forward mode, but as g(x) in the backward mode. You implement it as
t = g(x)
y = t + tf.stop_gradient(f(x) - t)
So in your case your g(x) could be an identity op, with a custom gradient using gradient_override_map
From TensorFlow 1.7 onward, tf.custom_gradient
is the way to go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With