Save and load model optimizer state

Tags:

I have a set of fairly complicated models that I am training and I am looking for a way to save and load the model optimizer states. The "trainer models" consist of different combinations of several other "weight models", of which some have shared weights, some have frozen weights depending on the trainer, etc. It is a bit too complicated of an example to share, but in short, I am not able to use model.save('model_file.h5') and keras.models.load_model('model_file.h5') when stopping and starting my training.

Using model.load_weights('weight_file.h5') works fine for testing my model if the training has finished, but if I attempt to continue training the model using this method, the loss does not come even close to returning to its last location. I have read that this is because the optimizer state is not saved using this method which makes sense. However, I need a method for saving and loading the states of the optimizers of my trainer models. It seems as though keras once had a model.optimizer.get_sate() and model.optimizer.set_sate() that would accomplish what I am after, but that does not seem to be the case anymore (at least for the Adam optimizer). Are there any other solutions with the current Keras?

745

asked Mar 27 '18 03:03

Starnetter

Video Answer

4 Answers

You can extract the important lines from the load_model and save_model functions.

For saving optimizer states, in save_model:

# Save optimizer weights.
symbolic_weights = getattr(model.optimizer, 'weights')
if symbolic_weights:
    optimizer_weights_group = f.create_group('optimizer_weights')
    weight_values = K.batch_get_value(symbolic_weights)

For loading optimizer states, in load_model:

# Set optimizer weights.
if 'optimizer_weights' in f:
    # Build train function (to get weight updates).
    if isinstance(model, Sequential):
        model.model._make_train_function()
    else:
        model._make_train_function()

    # ...

    try:
        model.optimizer.set_weights(optimizer_weight_values)

Combining the lines above, here's an example:

First fit the model for 5 epochs.

X, y = np.random.rand(100, 50), np.random.randint(2, size=100)
x = Input((50,))
out = Dense(1, activation='sigmoid')(x)
model = Model(x, out)
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(X, y, epochs=5)

Epoch 1/5
100/100 [==============================] - 0s 4ms/step - loss: 0.7716
Epoch 2/5
100/100 [==============================] - 0s 64us/step - loss: 0.7678
Epoch 3/5
100/100 [==============================] - 0s 82us/step - loss: 0.7665
Epoch 4/5
100/100 [==============================] - 0s 56us/step - loss: 0.7647
Epoch 5/5
100/100 [==============================] - 0s 76us/step - loss: 0.7638

Now save the weights and optimizer states.

model.save_weights('weights.h5')
symbolic_weights = getattr(model.optimizer, 'weights')
weight_values = K.batch_get_value(symbolic_weights)
with open('optimizer.pkl', 'wb') as f:
    pickle.dump(weight_values, f)

Rebuild the model in another python session, and load weights.

x = Input((50,))
out = Dense(1, activation='sigmoid')(x)
model = Model(x, out)
model.compile(optimizer='adam', loss='binary_crossentropy')

model.load_weights('weights.h5')
model._make_train_function()
with open('optimizer.pkl', 'rb') as f:
    weight_values = pickle.load(f)
model.optimizer.set_weights(weight_values)

Continue model training.

model.fit(X, y, epochs=5)

Epoch 1/5
100/100 [==============================] - 0s 674us/step - loss: 0.7629
Epoch 2/5
100/100 [==============================] - 0s 49us/step - loss: 0.7617
Epoch 3/5
100/100 [==============================] - 0s 49us/step - loss: 0.7611
Epoch 4/5
100/100 [==============================] - 0s 55us/step - loss: 0.7601
Epoch 5/5
100/100 [==============================] - 0s 49us/step - loss: 0.7594

163

answered Oct 16 '22 20:10

Yu-Yang

For those who are not using model.compile and instead performing automatic differentiation to apply the gradients manually with optimizer.apply_gradients, I think I have a solution.

First, save the optimizer weights: np.save(path, optimizer.get_weights())

Then, when you are ready to reload the optimizer, show the newly instantiated optimizer the size of the weights it will update by calling optimizer.apply_gradients on a list of tensors of the size of the variables for which you calculate gradients. It is extremely important to then set the weights of the model AFTER you set the weights of the optimizer because momentum-based optimizers like Adam will update the weights of the model even if we give it gradients which are zero.

import tensorflow as tf
import numpy as np

model = # instantiate model (functional or subclass of tf.keras.Model)

# Get saved weights
opt_weights = np.load('/path/to/saved/opt/weights.npy', allow_pickle=True)

grad_vars = model.trainable_weights
# This need not be model.trainable_weights; it must be a correctly-ordered list of 
# grad_vars corresponding to how you usually call the optimizer.

optimizer = tf.keras.optimizers.Adam(lrate)

zero_grads = [tf.zeros_like(w) for w in grad_vars]

# Apply gradients which don't do nothing with Adam
optimizer.apply_gradients(zip(zero_grads, grad_vars))

# Set the weights of the optimizer
optimizer.set_weights(opt_weights)

# NOW set the trainable weights of the model
model_weights = np.load('/path/to/saved/model/weights.npy', allow_pickle=True)
model.set_weights(model_weights)

Note that if we try to set the weights before calling apply_gradients for the first time, an error is thrown that the optimizer expects a weight list of length zero.

answered Oct 16 '22 21:10

Alex Trevithick

Completing Alex Trevithick answer, it is possible to avoid re calling model.set_weights, simply by saving the state of the variables before applying the gradient and then reloading. This can useful when loading a model from an h5 file, and looks cleaner (imo).

The saving/loading functions are the following (thanks Alex again):

def save_optimizer_state(optimizer, save_path, save_name):
    '''
    Save keras.optimizers object state.

    Arguments:
    optimizer --- Optimizer object.
    save_path --- Path to save location.
    save_name --- Name of the .npy file to be created.

    '''

    # Create folder if it does not exists
    if not os.path.exists(save_path):
        os.makedirs(save_path)
    
    # save weights
    np.save(os.path.join(save_path, save_name), optimizer.get_weights())

    return

def load_optimizer_state(optimizer, load_path, load_name, model_train_vars):
    '''
    Loads keras.optimizers object state.

    Arguments:
    optimizer --- Optimizer object to be loaded.
    load_path --- Path to save location.
    load_name --- Name of the .npy file to be read.
    model_train_vars --- List of model variables (obtained using Model.trainable_variables)

    '''

    # Load optimizer weights
    opt_weights = np.load(os.path.join(load_path, load_name)+'.npy', allow_pickle=True)

    # dummy zero gradients
    zero_grads = [tf.zeros_like(w) for w in model_train_vars]
    # save current state of variables
    saved_vars = [tf.identity(w) for w in model_train_vars]

    # Apply gradients which don't do nothing with Adam
    optimizer.apply_gradients(zip(zero_grads, model_train_vars))

    # Reload variables
    [x.assign(y) for x,y in zip(model_train_vars, saved_vars)]

    # Set the weights of the optimizer
    optimizer.set_weights(opt_weights)


    return

answered Oct 16 '22 20:10

Ramiro R.C.

upgrading Keras to 2.2.4 and using pickle solved this issue for me. with keras release 2.2.3 Keras models can now be safely pickled.

answered Oct 16 '22 20:10

ismail

Related questions
                            
                                How to get the font pixel height using PIL's ImageFont class?
                            
                                Python equivalent of Scala case class
                            
                                Opening web camera in Google Colab
                            
                                Shapefile reader in Python?
                            
                                Executing a Django Shell Command from the Command Line
                            
                                Why don't scripting languages output Unicode to the Windows console?
                            
                                Pyramid authorization for stored items
                            
                                Extracting words from a string, removing punctuation and returning a list with separated words
                            
                                SyntaxError: cannot assign to operator
                            
                                Creating a Python list comprehension with an if and break
                            
                                Creating a dictionary with list of lists in Python
                            
                                Timedelta is not defined
                            
                                Python, write in memory zip to file
                            
                                Calculate sunrise and sunset times for a given GPS coordinate within PostgreSQL
                            
                                Delete an uploaded file after downloading it from Flask
                            
                                Optimize the performance of dictionary membership for a list of Keys
                            
                                Sklearn preprocessing - PolynomialFeatures - How to keep column names/headers of the output array / dataframe
                            
                                Changing the scale of a tensor in tensorflow
                            
                                Pandas dataframe error: matplotlib.axes._subplots.AxesSubplot
                            
                                Replace nan values in tensorflow tensor

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Save and load model optimizer state

Tags:

python

machine-learning

tensorflow

keras

Starnetter

People also ask

Video Answer

4 Answers

Yu-Yang

Alex Trevithick

Ramiro R.C.

ismail

Recent Activity

Donate For Us