For a custom loss for a NN I use the function . u, given a pair (t,x), both points in an interval, is the the output of my NN. Problem is I'm stuck at how to compute the second derivative using
K.gradient
(K being the TensorFlow backend):
def custom_loss(input_tensor, output_tensor):
def loss(y_true, y_pred):
# so far, I can only get this right, naturally:
gradient = K.gradients(output_tensor, input_tensor)
# here I'm falling badly:
# d_t = K.gradients(output_tensor, input_tensor)[0]
# dd_x = K.gradient(K.gradients(output_tensor, input_tensor),
# input_tensor[1])
return gradient # obviously not useful, just for it to work
return loss
All my attemps, based on Input(shape=(2,))
, were variations of the commented lines in the snippet above, mainly trying to find the right indexation of the resulting tensor.
Sure enough I lack knowledge of how exactly tensors work. By the way, I know in TensorFlow itself I could simply use tf.hessian
, but I noticed it's just not present when using TF as a backend.
The second derivative measures the instantaneous rate of change of the first derivative. The sign of the second derivative tells us whether the slope of the tangent line to is increasing or decreasing.
Gradient descent is a first-order optimization algorithm, which means it doesn't take into account the second derivatives of the cost function.
The second derivative simply measures how much the gradient/tangent slope f′(x) changes as we make small changes in x. i.e. how small changes in x changes the gradient f′(x). So for example, if we had a large second derivative a we made a tiny move, then the tangent line should change a lot.
TensorFlow provides the tf. GradientTape API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually tf.
In order for a K.gradients()
layer to work like that, you have to enclose it in a Lambda()
layer, because otherwise a full Keras layer is not created, and you can't chain it or train through it. So this code will work (tested):
import keras
from keras.models import *
from keras.layers import *
from keras import backend as K
import tensorflow as tf
def grad( y, x ):
return Lambda( lambda z: K.gradients( z[ 0 ], z[ 1 ] ), output_shape = [1] )( [ y, x ] )
def network( i, d ):
m = Add()( [ i, d ] )
a = Lambda(lambda x: K.log( x ) )( m )
return a
fixed_input = Input(tensor=tf.constant( [ 1.0 ] ) )
double = Input(tensor=tf.constant( [ 2.0 ] ) )
a = network( fixed_input, double )
b = grad( a, fixed_input )
c = grad( b, fixed_input )
d = grad( c, fixed_input )
e = grad( d, fixed_input )
model = Model( inputs = [ fixed_input, double ], outputs = [ a, b, c, d, e ] )
print( model.predict( x=None, steps = 1 ) )
def network
models f( x ) = log( x + 2 ) at x = 1. def grad
is where the gradient calculation is done. This code outputs:
[array([1.0986123], dtype=float32), array([0.33333334], dtype=float32), array([-0.11111112], dtype=float32), array([0.07407408], dtype=float32), array([-0.07407409], dtype=float32)]
which are the correct values for log( 3 ), ⅓, -1 / 32, 2 / 33, -6 / 34.
For reference, the same code in plain TensorFlow (used for testing):
import tensorflow as tf
a = tf.constant( 1.0 )
a2 = tf.constant( 2.0 )
b = tf.log( a + a2 )
c = tf.gradients( b, a )
d = tf.gradients( c, a )
e = tf.gradients( d, a )
f = tf.gradients( e, a )
with tf.Session() as sess:
print( sess.run( [ b, c, d, e, f ] ) )
outputs the same values:
[1.0986123, [0.33333334], [-0.11111112], [0.07407408], [-0.07407409]]
tf.hessians()
does return the second derivative, that's a shorthand for chaining two tf.gradients()
. The Keras backend doesn't have hessians
though, so you do have to chain the two K.gradients()
.
If for some reason none of the above works, then you might want to consider numerically approximating the second derivative with taking the difference over a small ε distance. This basically triples the network for each input, so this solution introduces serious efficiency considerations, besides lacking in accuracy. Anyway, the code (tested):
import keras
from keras.models import *
from keras.layers import *
from keras import backend as K
import tensorflow as tf
def network( i, d ):
m = Add()( [ i, d ] )
a = Lambda(lambda x: K.log( x ) )( m )
return a
fixed_input = Input(tensor=tf.constant( [ 1.0 ], dtype = tf.float64 ) )
double = Input(tensor=tf.constant( [ 2.0 ], dtype = tf.float64 ) )
epsilon = Input( tensor = tf.constant( [ 1e-7 ], dtype = tf.float64 ) )
eps_reciproc = Input( tensor = tf.constant( [ 1e+7 ], dtype = tf.float64 ) )
a0 = network( Subtract()( [ fixed_input, epsilon ] ), double )
a1 = network( fixed_input, double )
a2 = network( Add()( [ fixed_input, epsilon ] ), double )
d0 = Subtract()( [ a1, a0 ] )
d1 = Subtract()( [ a2, a1 ] )
dv0 = Multiply()( [ d0, eps_reciproc ] )
dv1 = Multiply()( [ d1, eps_reciproc ] )
dd0 = Multiply()( [ Subtract()( [ dv1, dv0 ] ), eps_reciproc ] )
model = Model( inputs = [ fixed_input, double, epsilon, eps_reciproc ], outputs = [ a0, dv0, dd0 ] )
print( model.predict( x=None, steps = 1 ) )
Outputs:
[array([1.09861226]), array([0.33333334]), array([-0.1110223])]
(This only gets to the second derivative.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With