I'm trying to use Theano to compute the hessian of a function with respect to a vector as well as a couple scalars (edit: that is, I essentially want the scalars appended to the vector that I am computing the hessian with respect to). Here's a minimal example:
import theano
import theano.tensor as T
A = T.vector('A')
b,c = T.scalars('b','c')
y = T.sum(A)*b*c
My first try was:
hy = T.hessian(y,[A,b,c])
Which fails with AssertionError: tensor.hessian expects a (list of) 1 dimensional variable as 'wrt'
My second try was to combine A, b, and c with:
wrt = T.concatenate([A,T.stack(b,c)])
hy = T.hessian(y,[wrt])
Which fails with DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Join.0
What is the correct way to compute the hessian in this case?
Update: To clarify on what I am looking for, suppose A is a 2 element vector. Then the Hessian would be:
[[d2y/d2A1, d2y/dA1dA2, d2y/dA1dB, d2y/dA1dC],
[d2y/dA2dA1, d2y/d2A2, d2y/dA2dB, d2y/dA2dC],
[d2y/dBdA1, d2y/dBdA2, d2y/d2B, d2y/dABdC],
[d2y/dCdA1, d2y/dCdA2, d2y/dCdB, d2y/d2C]]
which for the example function y
should be:
[[0, 0, C, B],
[0, 0, C, B],
[C, C, 0, A1+A2],
[B, B, A1+A2, 0]]
So if we were to define a function:
f = theano.function([A,b,c], hy)
then, assuming we could compute hy
successfully, we would expect the output:
f([1,1], 4, 5) =
[[0, 0, 5, 4],
[0, 0, 5, 4],
[5, 5, 0, 2],
[4, 4, 2, 0]]
In my actual application, A has 25 elements and y
is more complicated, but the idea is the same.
If you pass b,c
as vectors, it should work. The hessian operator expects 1D arrays. Even though scalars should work, too, it is probably easiest to just provide the type of input it likes.
The reason why your stacking fails is that the stack
operation yields a new, non-endnode variable on a different branch of the graph with respect to which you can't generally take derivatives explicitly. So theano simply doesn't permit this.
This works for me:
import theano.tensor as T
A = T.vector('A')
b,c = T.vectors('b','c')
y = T.sum(A)*b[0]*c[0]
hy = T.hessian(y,[A,b,c])
Based on a suggestion from @eickenberg to combine the inputs at the numpy level, I used the following workaround:
import theano
import theano.tensor as T
A,temp = T.vectors('A','T')
b,c = T.scalars('b','c')
y = T.sum(A)*b*c
y2 = theano.clone(y,{A:temp[:-2],b:temp[-2],c:temp[-1]})
hy = T.hessian(y2,[temp])
f = theano.function([temp], hy)
f([1,1,4,5])
gives the expected output:
> [array([[ 0., 0., 5., 4.],
> [ 0., 0., 5., 4.],
> [ 5., 5., 0., 2.],
> [ 4., 4., 2., 0.]])]
This works but feels rather awkward, if anyone knows of a better (more general) solution please let me know.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With