Second derivative in Keras

Tags:

For a custom loss for a NN I use the function enter image description here . u, given a pair (t,x), both points in an interval, is the the output of my NN. Problem is I'm stuck at how to compute the second derivative using K.gradient (K being the TensorFlow backend):

Click to copy

def custom_loss(input_tensor, output_tensor):
    def loss(y_true, y_pred):

        # so far, I can only get this right, naturally:            
        gradient = K.gradients(output_tensor, input_tensor)

        # here I'm falling badly:

        # d_t = K.gradients(output_tensor, input_tensor)[0]
        # dd_x = K.gradient(K.gradients(output_tensor, input_tensor),
        #                   input_tensor[1])

        return gradient # obviously not useful, just for it to work
    return loss

All my attemps, based on Input(shape=(2,)), were variations of the commented lines in the snippet above, mainly trying to find the right indexation of the resulting tensor.

Sure enough I lack knowledge of how exactly tensors work. By the way, I know in TensorFlow itself I could simply use tf.hessian, but I noticed it's just not present when using TF as a backend.

942

asked Apr 20 '18 06:04

Lucas Farias

1 Answers

In order for a K.gradients() layer to work like that, you have to enclose it in a Lambda() layer, because otherwise a full Keras layer is not created, and you can't chain it or train through it. So this code will work (tested):

Click to copy

import keras
from keras.models import *
from keras.layers import *
from keras import backend as K
import tensorflow as tf

def grad( y, x ):
    return Lambda( lambda z: K.gradients( z[ 0 ], z[ 1 ] ), output_shape = [1] )( [ y, x ] )

def network( i, d ):
    m = Add()( [ i, d ] )
    a = Lambda(lambda x: K.log( x ) )( m )
    return a

fixed_input = Input(tensor=tf.constant( [ 1.0 ] ) )
double = Input(tensor=tf.constant( [ 2.0 ] ) )

a = network( fixed_input, double )

b = grad( a, fixed_input )
c = grad( b, fixed_input )
d = grad( c, fixed_input )
e = grad( d, fixed_input )

model = Model( inputs = [ fixed_input, double ], outputs = [ a, b, c, d, e ] )

print( model.predict( x=None, steps = 1 ) )

def network models f( x ) = log( x + 2 ) at x = 1. def grad is where the gradient calculation is done. This code outputs:

[array([1.0986123], dtype=float32), array([0.33333334], dtype=float32), array([-0.11111112], dtype=float32), array([0.07407408], dtype=float32), array([-0.07407409], dtype=float32)]

which are the correct values for log( 3 ), ⅓, -1 / 3², 2 / 3³, -6 / 3⁴.

Reference TensorFlow code

For reference, the same code in plain TensorFlow (used for testing):

Click to copy

import tensorflow as tf

a = tf.constant( 1.0 )
a2 = tf.constant( 2.0 )

b = tf.log( a + a2 )
c = tf.gradients( b, a )
d = tf.gradients( c, a )
e = tf.gradients( d, a )
f = tf.gradients( e, a )

with tf.Session() as sess:
    print( sess.run( [ b, c, d, e, f ] ) )

outputs the same values:

[1.0986123, [0.33333334], [-0.11111112], [0.07407408], [-0.07407409]]

Hessians

tf.hessians() does return the second derivative, that's a shorthand for chaining two tf.gradients(). The Keras backend doesn't have hessians though, so you do have to chain the two K.gradients().

Numerical approximation

If for some reason none of the above works, then you might want to consider numerically approximating the second derivative with taking the difference over a small ε distance. This basically triples the network for each input, so this solution introduces serious efficiency considerations, besides lacking in accuracy. Anyway, the code (tested):

Click to copy

import keras
from keras.models import *
from keras.layers import *
from keras import backend as K
import tensorflow as tf

def network( i, d ):
    m = Add()( [ i, d ] )
    a = Lambda(lambda x: K.log( x ) )( m )
    return a

fixed_input = Input(tensor=tf.constant( [ 1.0 ], dtype = tf.float64 ) )
double = Input(tensor=tf.constant( [ 2.0 ], dtype = tf.float64 ) )

epsilon = Input( tensor = tf.constant( [ 1e-7 ], dtype = tf.float64 ) )
eps_reciproc = Input( tensor = tf.constant( [ 1e+7 ], dtype = tf.float64 ) )

a0 = network( Subtract()( [ fixed_input, epsilon ] ), double )
a1 = network(               fixed_input,              double )
a2 = network(      Add()( [ fixed_input, epsilon ] ), double )

d0 = Subtract()( [ a1, a0 ] )
d1 = Subtract()( [ a2, a1 ] )

dv0 = Multiply()( [ d0, eps_reciproc ] )
dv1 = Multiply()( [ d1, eps_reciproc ] )

dd0 = Multiply()( [ Subtract()( [ dv1, dv0 ] ), eps_reciproc ] )

model = Model( inputs = [ fixed_input, double, epsilon, eps_reciproc ], outputs = [ a0, dv0, dd0 ] )

print( model.predict( x=None, steps = 1 ) )

Outputs:

[array([1.09861226]), array([0.33333334]), array([-0.1110223])]

(This only gets to the second derivative.)

132

answered Nov 07 '22 00:11

Peter Szoldan

Related questions
                            
                                Move mouse cursor to second monitor using pyautogui
                            
                                Hide command prompt in Selenium ChromeDriver
                            
                                Mock method which returns same value passed as argument
                            
                                Difference between slash operator and comma separator in pathlib Path
                            
                                How to pivot one column containing strings in a dataframe? [duplicate]
                            
                                How to assign values to multiple non existing columns in a pandas dataframe?
                            
                                Generating SIMD instructions from Cython code
                            
                                Pandas get first and last value of column from group
                            
                                What are the uses of tf.space_to_depth?
                            
                                What exactly are the csv module's Dialect settings for excel-tab?
                            
                                Pandas groupby on a column of lists
                            
                                Why is my implementations of the log-loss (or cross-entropy) not producing the same results?
                            
                                Can't load URL: The domain of this URL isn't included in the app's domains. Django Facebook Auth
                            
                                PyOpenSSL - how can I get SAN(Subject Alternative Names) list
                            
                                How to crop the biggest object in image with python opencv?
                            
                                Last Record in Column, SQLALCHEMY
                            
                                Raise MigrationSchemaMissing("Unable to create the django_migrations table (%s)" % exc)
                            
                                How to get the same hash in Python3 and Mac / Linux terminal?
                            
                                Python: variable naming convention - file, path, filepath, file_path
                            
                                Flask Admin extend "with select"-dropdown menu with custom button

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Second derivative in Keras

Tags:

python

tensorflow

keras

Lucas Farias

People also ask

1 Answers

Reference TensorFlow code

Hessians

Numerical approximation

Peter Szoldan

Recent Activity

Donate For Us