Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to continue training from a specific epoch?

Tags:

keras

A resource manager I'm using to fit a Keras model limits the access to a server to 1 day at a time. After this day, I need to start a new job. Is it possible with Keras to save the current model at epoch K, and then load that model to continue training epoch K+1 (i.e., with a new job)?

like image 988
mossaab Avatar asked Apr 01 '16 12:04

mossaab


People also ask

How do you continue training from a checkpoint?

To continue training a loaded model with checkpoints, we simply rerun the model. fit function with the callback still parsed. This however overwrites the currently saved best model, so make sure to change the checkpoint file path if this is undesired.

How many epochs do I need?

The right number of epochs depends on the inherent perplexity (or complexity) of your dataset. A good rule of thumb is to start with a value that is 3 times the number of columns in your data. If you find that the model is still improving after all epochs complete, try again with a higher value.

What is epochs in model fit?

The number of epochs is a hyperparameter that defines the number times that the learning algorithm will work through the entire training dataset. One epoch means that each sample in the training dataset has had an opportunity to update the internal model parameters.


2 Answers

You can save weights after every epoch by specifying a callback:

weight_save_callback = ModelCheckpoint('/path/to/weights.{epoch:02d}-{val_loss:.2f}.hdf5', monitor='val_loss', verbose=0, save_best_only=False, mode='auto')
model.fit(X_train,y_train,batch_size=batch_size,nb_epoch=nb_epoch,callbacks=[weight_save_callback])

This will save the weights after every epoch. You can then load them with:

model = Sequential()
model.add(...)
model.load('path/to/weights.hf5')

Of course your model needs to be the same in both cases.

like image 132
yhenon Avatar answered Nov 08 '22 21:11

yhenon


You can add the initial_epoch argument. This will allow you to continue training from a specific epoch.

like image 24
Henryk Borzymowski Avatar answered Nov 08 '22 21:11

Henryk Borzymowski