Full gradient descent in keras

Tags:

I am trying to implement full gradient descent in keras. This means that for each epoch I am training on the entire dataset. This is why the batch size is defined to be the length size of the training set.

from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD,Adam
from keras import regularizers
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline  
import random
from numpy.random import seed
import random

def xrange(start_point,end_point,N,base):
    temp = np.logspace(0.1, 1, N,base=base,endpoint=False)
    temp=temp-temp.min()
    temp=(0.0+temp)/(0.0+temp.max()) #this is between 0 and 1
    return (end_point-start_point)*temp +start_point #this is the range

def train_model(x_train,y_train,x_test):
    #seed(1)
    model=Sequential()
    num_units=100
    act='relu'
    model.add(Dense(num_units,input_shape=(1,),activation=act)) 
    model.add(Dense(num_units,activation=act))
    model.add(Dense(num_units,activation=act))
    model.add(Dense(num_units,activation=act))
    model.add(Dense(1,activation='tanh')) #output layer 1 unit ; activation='tanh'
    model.compile(Adam(),'mean_squared_error',metrics=['mse'])
    history=model.fit(x_train,y_train,batch_size=len(x_train),epochs=500,verbose=0,validation_split = 0.2 ) #train on the noise (not moshe)
    fit=model.predict(x_test)
    loss = history.history['loss']
    val_loss = history.history['val_loss']
    return fit

N = 1024
start_point=-5.25
end_point=5.25
base=500# the base of the log of the trainning
train_step=0.0007
x_test=np.arange(start_point,end_point,train_step+0.05)

x_train=xrange(start_point,end_point,N,base)
#random.shuffle(x_train)

function_y=np.sin(3*x_train)/2
noise=np.random.uniform(-0.2,0.2,len(function_y))
y_train=function_y+noise
fit=train_model(x_train,y_train,x_test)

plt.scatter(x_train,y_train, facecolors='none', edgecolors='g') #plt.plot(x_value,sample,'bo')
plt.scatter(x_test, fit, facecolors='none', edgecolors='b') #plt.plot(x_value,sample,'bo')

enter image description here

However when I uncomment the #random.shuffle(x_train) - in order to shuffle the trainning. plot :

I don't understand why I get different plots (the green circles are the trainning and the blue are the are what the modern learned). as in both cases the batch is of ALL the dataset. So the shuffle shouldn't change anything.
Thank you .

Ariel

521

asked Dec 13 '18 20:12

user552231

1 Answers

This happens for two reasons:

First, when the data is not shuffled, the train/validation split is inappropriate.
Second, full gradient descent performs a single update per epoch, so more training epochs might be required to converge.

Why doesn't your model match the wave?

From model.fit:

validation_split: Float between 0 and 1. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling.

Which means that your validation set consists of the last 20% training samples. Because you are using a log scale for your independent variable (x_train), it turns out that your train/validation split is:

split_point = int(0.2*N)
x_val = x_train[-split_point:]
y_val = y_train[-split_point:]
x_train_ = x_train[:-split_point]
y_train_ = y_train[:-split_point]
plt.scatter(x_train_, y_train_, c='g')
plt.scatter(x_val, y_val, c='r')
plt.show()

Train - validation split

In the previous plot, training and validation data are represented by green and red points, respectively. Note that your training dataset is not representative of the whole population.

Why does it still not match the training dataset?

In addition to an inappropriate train/test split, full gradient descent might require more training epochs to converge (the gradients are less noisy, but it only performs a single gradient update per epoch). If, instead, you train your model for ~1500 epochs (or use mini-batch gradient descent with a batch size of, say, 32), you end up getting:

Result

187

answered Oct 18 '22 21:10

rvinas

Related questions
                            
                                Understanding memory behavior of Dask distributed
                            
                                How to determine the uncertainty of fit parameters with Python?
                            
                                Possible values for platform.machine()
                            
                                Flask - store object directly in a session [duplicate]
                            
                                Connecting to Amazon Aurora using SQLAlchemy
                            
                                metaclass and __prepare__ ()
                            
                                Flask's built-in server always 404 with SERVER_NAME set
                            
                                Python Abstract class with concrete methods
                            
                                python - how to docstring kwargs and their expected types
                            
                                How to bulk write TFRecords?
                            
                                Render current status only on template in StreamingHttpResponse in Django
                            
                                Django OneToOneField default value
                            
                                Call an async function in an normal function
                            
                                How to generate random numbers to satisfy a specific mean and median in python?
                            
                                How to use SMOP to convert Matlab into Python code
                            
                                How do I use an AWS SessionToken to read from S3 in pyspark?
                            
                                How to run Keras.model() for prediction inside a tensorflow session?
                            
                                Python freezes on smtplib.SMTP("smtp.gmail.com", 587)
                            
                                How to use boto3 client with Python multiprocessing?
                            
                                Pytest: How to parametrize a test with a list that is returned from a fixture?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Full gradient descent in keras

Tags:

python

machine-learning

deep-learning

keras

gradient-descent