Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras BFGS training using Scipy minimize

I would like to train a feed forward neural network implemented in Keras using BFGS. To see if it could be done, I implemented a Perceptron using scipy.optimize.minimize, with the code below.

from __future__ import print_function
import numpy as np
from scipy.optimize import minimize
from keras.models import Sequential
from keras.layers.core import Dense

# Dummy training examples
X = np.array([[-1,2,-3,-1],[3,2,-1,-4]]).astype('float')
Y = np.array([[2],[-1]]).astype('float')

model = Sequential()
model.add(Dense(1, activation='sigmoid', input_dim=4))

def loss(W):
    weightsList = [np.zeros((4,1)), np.zeros(1)]
    for i in range(4):
        weightsList[0][i,0] = W[i]
    weightsList[1][0] = W[4]
    model.set_weights(weightsList)
    preds = model.predict(X)
    mse = np.sum(np.square(np.subtract(preds,Y)))/len(X[:,0])
    return mse

# Dummy first guess
V = [1.0, 2.0, 3.0, 4.0, 1.0]
res = minimize(loss, x0=V, method = 'BFGS', options={'disp':True})
print(res.x)

However, the output of this shows that the loss function does not optimize:

Using Theano backend.
Using gpu device 0: GeForce GTX 960M (CNMeM is disabled, cuDNN not available)
Optimization terminated successfully.
         Current function value: 2.499770
         Iterations: 0
         Function evaluations: 7
         Gradient evaluations: 1
[ 1.  2.  3.  4.  1.]

Any ideas why this didn't work? Is it because I didn't input the gradient to minimize, and it cannot calculate the numerical approximation in this case?

like image 368
Rish Avatar asked Jul 19 '16 06:07

Rish


1 Answers

Is it because I didn't input the gradient to minimize, and it cannot calculate the numerical approximation in this case?

It's because you don't output the gradients, so scipy approximates them by numerical differentiation. That is it evaluate the function at X, then at X + epsilon, to approximate the local gradient.

But the epsilon is small enough that in the conversion to 32bit for theano, the change is completely lost. The starting guess is not in fact a minimum, scipy just thinks so since it sees no change in value in the objective function. You simply need to increase the epsilon as such:

V = [1.0, 2.0, 3.0, 4.0, 1.0]
print('Starting loss = {}'.format(loss(V)))
# set the eps option to increase the epsilon used in numerical diff
res = minimize(loss, x0=V, method = 'BFGS', options={'eps':1e-6,'disp':True})
print('Ending loss = {}'.format(loss(res.x)))

Which gives:

Using Theano backend.
Starting loss = 2.49976992001
Optimization terminated successfully.
         Current function value: 1.002703
         Iterations: 19
         Function evaluations: 511
         Gradient evaluations: 73
Ending loss = 1.00270344184
like image 57
yhenon Avatar answered Nov 04 '22 14:11

yhenon