Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining scalars and vectors in Theano for computing Hessian

I'm trying to use Theano to compute the hessian of a function with respect to a vector as well as a couple scalars (edit: that is, I essentially want the scalars appended to the vector that I am computing the hessian with respect to). Here's a minimal example:

import theano
import theano.tensor as T
A = T.vector('A')
b,c = T.scalars('b','c')
y = T.sum(A)*b*c

My first try was:

hy = T.hessian(y,[A,b,c])

Which fails with AssertionError: tensor.hessian expects a (list of) 1 dimensional variable as 'wrt'

My second try was to combine A, b, and c with:

wrt = T.concatenate([A,T.stack(b,c)])
hy = T.hessian(y,[wrt])

Which fails with DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Join.0

What is the correct way to compute the hessian in this case?

Update: To clarify on what I am looking for, suppose A is a 2 element vector. Then the Hessian would be:

[[d2y/d2A1, d2y/dA1dA2, d2y/dA1dB, d2y/dA1dC],
[d2y/dA2dA1, d2y/d2A2, d2y/dA2dB, d2y/dA2dC],
[d2y/dBdA1, d2y/dBdA2, d2y/d2B, d2y/dABdC],
[d2y/dCdA1, d2y/dCdA2, d2y/dCdB, d2y/d2C]]

which for the example function y should be:

[[0, 0, C, B],
[0, 0, C, B],
[C, C, 0, A1+A2],
[B, B, A1+A2, 0]]

So if we were to define a function:

f = theano.function([A,b,c], hy)

then, assuming we could compute hy successfully, we would expect the output:

f([1,1], 4, 5) = 
    [[0, 0, 5, 4],
    [0, 0, 5, 4],
    [5, 5, 0, 2],
    [4, 4, 2, 0]]

In my actual application, A has 25 elements and y is more complicated, but the idea is the same.

like image 832
Kevin Zielnicki Avatar asked Oct 30 '22 13:10

Kevin Zielnicki


2 Answers

If you pass b,c as vectors, it should work. The hessian operator expects 1D arrays. Even though scalars should work, too, it is probably easiest to just provide the type of input it likes.

The reason why your stacking fails is that the stack operation yields a new, non-endnode variable on a different branch of the graph with respect to which you can't generally take derivatives explicitly. So theano simply doesn't permit this.

This works for me:

import theano.tensor as T
A = T.vector('A')
b,c = T.vectors('b','c')
y = T.sum(A)*b[0]*c[0]

hy = T.hessian(y,[A,b,c])
like image 67
eickenberg Avatar answered Nov 09 '22 06:11

eickenberg


Based on a suggestion from @eickenberg to combine the inputs at the numpy level, I used the following workaround:

import theano
import theano.tensor as T

A,temp = T.vectors('A','T')
b,c = T.scalars('b','c')

y = T.sum(A)*b*c
y2 = theano.clone(y,{A:temp[:-2],b:temp[-2],c:temp[-1]})

hy = T.hessian(y2,[temp])
f = theano.function([temp], hy)

f([1,1,4,5])

gives the expected output:

> [array([[ 0.,  0.,  5.,  4.],
>         [ 0.,  0.,  5.,  4.],
>         [ 5.,  5.,  0.,  2.],
>         [ 4.,  4.,  2.,  0.]])]

This works but feels rather awkward, if anyone knows of a better (more general) solution please let me know.

like image 22
Kevin Zielnicki Avatar answered Nov 09 '22 07:11

Kevin Zielnicki