I'm using keras defined as submodule in tensorflow v2. I'm training my model using fit_generator()
method. I want to save my model every 10 epochs. How can I achieve this?
In Keras (not as a submodule of tf), I can give ModelCheckpoint(model_savepath,period=10)
. But in tf v2, they've changed this to ModelCheckpoint(model_savepath, save_freq)
where save_freq
can be 'epoch'
in which case model is saved every epoch. If save_freq
is integer, model is saved after so many samples have been processed. But I want it to be after 10 epochs. How can I achieve this?
Let's say for example, after epoch = 150 is over, it will be saved as model. save(model_1. h5) and after epoch = 152 , it will be saved as model. save(model_2.
To save weights every epoch, you can use something known as callbacks in Keras. checkpoint = ModelCheckpoint(.....) , assign the argument 'period' as 1 which assigns the periodicity of epochs. This should do it.
There are two formats you can use to save an entire model to disk: the TensorFlow SavedModel format, and the older Keras H5 format. The recommended format is SavedModel. It is the default when you use model.save() .
Using save_weights() method It saves the weights of the layers contained in the model. It is advised to use the save() method to save h5 models instead of save_weights() method for saving a model using tensorflow. However, h5 models can also be saved using save_weights() method.
SavedModel is the more comprehensive save format that saves the model architecture, weights, and the traced Tensorflow subgraphs of the call functions. This enables Keras to restore both built-in layers as well as custom objects. # Create a simple model. # Train the model. # Calling `save ('my_model')` creates a SavedModel folder `my_model`.
The function name is sufficient for loading as long as it is registered as a custom object. It's possible to load the TensorFlow graph generated by the Keras. If you do so, you won't need to provide any custom_objects. You can do so like this:
model.save () or tf.keras.models.save_model () tf.keras.models.load_model () There are two formats you can use to save an entire model to disk: the TensorFlow SavedModel format, and the older Keras H5 format . The recommended format is SavedModel. It is the default when you use model.save ().
Also, saving every N epochs is not an option for me. What I am trying to do is save the model after some specific epochs are done. Let's say for example, after epoch = 150 is over, it will be saved as model.save (model_1.h5) and after epoch = 152, it will be saved as model.save (model_2.h5) etc... for few specific epochs.
Using tf.keras.callbacks.ModelCheckpoint
use save_freq='epoch'
and pass an extra argument period=10
.
Although this is not documented in the official docs, that is the way to do it (notice it is documented that you can pass period
, just doesn't explain what it does).
Explicitly computing the number of batches per epoch worked for me.
BATCH_SIZE = 20
STEPS_PER_EPOCH = train_labels.size / BATCH_SIZE
SAVE_PERIOD = 10
# Create a callback that saves the model's weights every 10 epochs
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_weights_only=True,
save_freq= int(SAVE_PERIOD * STEPS_PER_EPOCH))
# Train the model with the new callback
model.fit(train_images,
train_labels,
batch_size=BATCH_SIZE,
steps_per_epoch=STEPS_PER_EPOCH,
epochs=50,
callbacks=[cp_callback],
validation_data=(test_images,test_labels),
verbose=0)
The param period
mentioned in the accepted answer is now not available anymore.
Using the save_freq
param is an alternative, but risky, as mentioned in the docs; e.g., if the dataset size changes, it may become unstable: Note that if the saving isn't aligned to epochs, the monitored metric may potentially be less reliable (again taken from the docs).
Thus, I use a subclass as a solution:
class EpochModelCheckpoint(tf.keras.callbacks.ModelCheckpoint):
def __init__(self,
filepath,
frequency=1,
monitor='val_loss',
verbose=0,
save_best_only=False,
save_weights_only=False,
mode='auto',
options=None,
**kwargs):
super(EpochModelCheckpoint, self).__init__(filepath, monitor, verbose, save_best_only, save_weights_only,
mode, "epoch", options)
self.epochs_since_last_save = 0
self.frequency = frequency
def on_epoch_end(self, epoch, logs=None):
self.epochs_since_last_save += 1
# pylint: disable=protected-access
if self.epochs_since_last_save % self.frequency == 0:
self._save_model(epoch=epoch, batch=None, logs=logs)
def on_train_batch_end(self, batch, logs=None):
pass
use it as
callbacks=[
EpochModelCheckpoint("/your_save_location/epoch{epoch:02d}", frequency=10),
]
Note that, dependent on your TF version, you may have to change the args in the call to the superclass __init__
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With