I would like to train a feed forward neural network implemented in Keras using BFGS. To see if it could be done, I implemented a Perceptron using scipy.optimize.minimize
, with the code below.
from __future__ import print_function
import numpy as np
from scipy.optimize import minimize
from keras.models import Sequential
from keras.layers.core import Dense
# Dummy training examples
X = np.array([[-1,2,-3,-1],[3,2,-1,-4]]).astype('float')
Y = np.array([[2],[-1]]).astype('float')
model = Sequential()
model.add(Dense(1, activation='sigmoid', input_dim=4))
def loss(W):
weightsList = [np.zeros((4,1)), np.zeros(1)]
for i in range(4):
weightsList[0][i,0] = W[i]
weightsList[1][0] = W[4]
model.set_weights(weightsList)
preds = model.predict(X)
mse = np.sum(np.square(np.subtract(preds,Y)))/len(X[:,0])
return mse
# Dummy first guess
V = [1.0, 2.0, 3.0, 4.0, 1.0]
res = minimize(loss, x0=V, method = 'BFGS', options={'disp':True})
print(res.x)
However, the output of this shows that the loss function does not optimize:
Using Theano backend.
Using gpu device 0: GeForce GTX 960M (CNMeM is disabled, cuDNN not available)
Optimization terminated successfully.
Current function value: 2.499770
Iterations: 0
Function evaluations: 7
Gradient evaluations: 1
[ 1. 2. 3. 4. 1.]
Any ideas why this didn't work? Is it because I didn't input the gradient to minimize
, and it cannot calculate the numerical approximation in this case?
Is it because I didn't input the gradient to minimize, and it cannot calculate the numerical approximation in this case?
It's because you don't output the gradients, so scipy approximates them by numerical differentiation. That is it evaluate the function at X, then at X + epsilon, to approximate the local gradient.
But the epsilon is small enough that in the conversion to 32bit for theano, the change is completely lost. The starting guess is not in fact a minimum, scipy just thinks so since it sees no change in value in the objective function. You simply need to increase the epsilon as such:
V = [1.0, 2.0, 3.0, 4.0, 1.0]
print('Starting loss = {}'.format(loss(V)))
# set the eps option to increase the epsilon used in numerical diff
res = minimize(loss, x0=V, method = 'BFGS', options={'eps':1e-6,'disp':True})
print('Ending loss = {}'.format(loss(res.x)))
Which gives:
Using Theano backend.
Starting loss = 2.49976992001
Optimization terminated successfully.
Current function value: 1.002703
Iterations: 19
Function evaluations: 511
Gradient evaluations: 73
Ending loss = 1.00270344184
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With