Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TensorFlow: Compute Hessian matrix (and higher order derivatives)

I would like to be able to compute higher order derivatives for my loss function. At the very least I would like to be able to compute the Hessian matrix. At the moment I am computing a numerical approximation to the Hessian but this is more expensive, and more importantly, as far as I understand, inaccurate if the matrix is ill-conditioned (with very large condition number).

Theano implements this through symbolic looping, see here, but Tensorflow does not seem to support symbolic control flow yet, see here. A similar issue has been raised on TF github page, see here, but it looks like nobody has followed up on the issue for a while.

Is anyone aware of more recent developments or ways to compute higher order derivatives (symbolically) in TensorFlow?

like image 476
stefano Avatar asked Dec 25 '22 09:12

stefano


2 Answers

Well, you can , with little effort, compute the hessian matrix!

Suppose you have two variables :

x = tf.Variable(np.random.random_sample(), dtype=tf.float32)
y = tf.Variable(np.random.random_sample(), dtype=tf.float32)

and a function defined using these 2 variables:

f = tf.pow(x, cons(2)) + cons(2) * x * y + cons(3) * tf.pow(y, cons(2)) + cons(4) * x + cons(5) * y + cons(6)

where:

def cons(x):
    return tf.constant(x, dtype=tf.float32)

So in algebraic terms, this function is

enter image description here

Now we define a method that compute the hessian:

def compute_hessian(fn, vars):
    mat = []
    for v1 in vars:
        temp = []
        for v2 in vars:
            # computing derivative twice, first w.r.t v2 and then w.r.t v1
            temp.append(tf.gradients(tf.gradients(f, v2)[0], v1)[0])
        temp = [cons(0) if t == None else t for t in temp] # tensorflow returns None when there is no gradient, so we replace None with 0
        temp = tf.pack(temp)
        mat.append(temp)
    mat = tf.pack(mat)
    return mat

and call it with:

# arg1: our defined function, arg2: list of tf variables associated with the function
hessian = compute_hessian(f, [x, y])

Now we grab a tensorflow session, initialize the variables, and run hessian :

sess = tf.Session()
sess.run(tf.initialize_all_variables())
print sess.run(hessian)

Note: Since the function we used is quadratic in nature (and we are differentiating twice), the hessian returned will have constant values irrespective of the variables.

The output is :

[[ 2.  2.]
[ 2.  6.]]
like image 63
Rajarshee Mitra Avatar answered Dec 30 '22 21:12

Rajarshee Mitra


A word of caution: Hessian matrices (or more generally, tensors) are expensive to compute and store. You may actually re-think if you really need the full Hessian, or just some hessian properties. A number of them, including traces, norms, and top eigen-values can be obtained without explicit hessian matrix, just using the Hessian-vector product oracle. In turn, hessian-vector products can be implemented efficiently (also in leading autodiff frameworks such as Tensorflow and PyTorch)

like image 35
Maciej S. Avatar answered Dec 30 '22 20:12

Maciej S.