I have a loss value/function and I would like to compute all the second derivatives with respect to a tensor f (of size n). I managed to use tf.gradients twice, but when applying it for the second time, it sums the derivatives across the first input (see second_derivatives in my code).
Also I managed to retrieve the Hessian matrix, but I would like to only compute its diagonal to avoid extra-computation.
import tensorflow as tf
import numpy as np
f = tf.Variable(np.array([[1., 2., 0]]).T)
loss = tf.reduce_prod(f ** 2 - 3 * f + 1)
first_derivatives = tf.gradients(loss, f)[0]
second_derivatives = tf.gradients(first_derivatives, f)[0]
hessian = [tf.gradients(first_derivatives[i,0], f)[0][:,0] for i in range(3)]
model = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(model)
print "\nloss\n", sess.run(loss)
print "\nloss'\n", sess.run(first_derivatives)
print "\nloss''\n", sess.run(second_derivatives)
hessian_value = np.array(map(list, sess.run(hessian)))
print "\nHessian\n", hessian_value
My thinking was that tf.gradients(first_derivatives, f[0, 0])[0] would work to retrieve for instance the second derivative with respect to f_0 but it seems that tensorflow doesn't allow to derive from a slice of a tensor.
The Hessian matrix is a way of organizing all the second partial derivative information of a multivariable function.
Tensorflow calculates derivatives using automatic differentiation. This is different from symbolic differentiation and numeric differentiation (aka finite differences). More than a smart math approach, it is a smart programming approach.
To differentiate automatically, TensorFlow needs to remember what operations happen in what order during the forward pass. Then, during the backward pass, TensorFlow traverses this list of operations in reverse order to compute gradients.
Tensorflow has built in functions that can normalize the data for you. Our algorithms usually have a set of parameters that we hold constant throughout the procedure. For example, this can be the number of iterations, the learning rate, or other fixed parameters of our choosing.
The following function calculates 2nd derivatives (the diagonal of the Hessian matrix) in Tensorflow 2.0:
%tensorflow_version 2.x # Tells Colab to load TF 2.x
import tensorflow as tf
def calc_hessian_diag(f, x):
"""
Calculates the diagonal entries of the Hessian of the function f
(which maps rank-1 tensors to scalars) at coordinates x (rank-1
tensors).
Let k be the number of points in x, and n be the dimensionality of
each point. For each point k, the function returns
(d^2f/dx_1^2, d^2f/dx_2^2, ..., d^2f/dx_n^2) .
Inputs:
f (function): Takes a shape-(k,n) tensor and outputs a
shape-(k,) tensor.
x (tf.Tensor): The points at which to evaluate the Laplacian
of f. Shape = (k,n).
Outputs:
A tensor containing the diagonal entries of the Hessian of f at
points x. Shape = (k,n).
"""
# Use the unstacking and re-stacking trick, which comes
# from https://github.com/xuzhiqin1990/laplacian/
with tf.GradientTape(persistent=True) as g1:
# Turn x into a list of n tensors of shape (k,)
x_unstacked = tf.unstack(x, axis=1)
g1.watch(x_unstacked)
with tf.GradientTape() as g2:
# Re-stack x before passing it into f
x_stacked = tf.stack(x_unstacked, axis=1) # shape = (k,n)
g2.watch(x_stacked)
f_x = f(x_stacked) # shape = (k,)
# Calculate gradient of f with respect to x
df_dx = g2.gradient(f_x, x_stacked) # shape = (k,n)
# Turn df/dx into a list of n tensors of shape (k,)
df_dx_unstacked = tf.unstack(df_dx, axis=1)
# Calculate 2nd derivatives
d2f_dx2 = []
for df_dxi,xi in zip(df_dx_unstacked, x_unstacked):
# Take 2nd derivative of each dimension separately:
# d/dx_i (df/dx_i)
d2f_dx2.append(g1.gradient(df_dxi, xi))
# Stack 2nd derivates
d2f_dx2_stacked = tf.stack(d2f_dx2, axis=1) # shape = (k,n)
return d2f_dx2_stacked
Here's an example usage, with the function f(x) = ln(r)
, where x
are 3D coordinates and r
is the radius is spherical coordinates:
f = lambda q : tf.math.log(tf.math.reduce_sum(q**2, axis=1))
x = tf.random.uniform((5,3))
d2f_dx2 = calc_hessian_diag(f, x)
print(d2f_dx2)
The will look something like this:
tf.Tensor(
[[ 1.415968 1.0215727 -0.25363517]
[-0.67299247 2.4847088 0.70901346]
[ 1.9416015 -1.1799507 1.3937857 ]
[ 1.4748447 0.59702784 -0.52290654]
[ 1.1786096 0.07442689 0.2396735 ]], shape=(5, 3), dtype=float32)
We can check the correctness of the implementation by calculating the Laplacian (i.e., by summing the diagonal of the Hessian matrix), and comparing to the theoretical answer for our chosen function, 2 / r^2
:
print(tf.reduce_sum(d2f_dx2, axis=1)) # Laplacian from summing above results
print(2./tf.math.reduce_sum(x**2, axis=1)) # Analytic expression for Lapalcian
I get the following:
tf.Tensor([2.1839054 2.5207298 2.1554365 1.5489659 1.49271 ], shape=(5,), dtype=float32)
tf.Tensor([2.1839058 2.5207298 2.1554365 1.5489662 1.4927098], shape=(5,), dtype=float32)
They agree to within rounding error.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With