Keras: does save_model really save all optimizer weights?

Tags:

keras

Suppose you have a Keras model with an optimizer like Adam that you save via save_model. If you load the model again with load_model, does it really load ALL optimizer parameters + weights?

Based on the code of save_model(Link), Keras saves the config of the optimizer:

f.attrs['training_config'] = json.dumps({
                             'optimizer_config': {
                             'class_name': model.optimizer.__class__.__name__,
                             'config': model.optimizer.get_config()},

which, in the case of Adam for example (Link), is as follows:

def get_config(self):
    config = {'lr': float(K.get_value(self.lr)),
              'beta_1': float(K.get_value(self.beta_1)),
              'beta_2': float(K.get_value(self.beta_2)),
              'decay': float(K.get_value(self.decay)),
              'epsilon': self.epsilon}

As such, this only saves the fundamental parameters but no per-variable optimizer weights.

However, after dumping the config in save_model, it looks like some optimizer weights are saved as well (Link). Unfortunately, I can't really understand if every weight of the optimizer saved.

So if you want to continue training the model in a new session with load_model, is the state of the optimizer really 100% the same as in the last training session? E.g. in the case of SGD with momentum, does it save all per-variable momentums?

Or in general, does it make a difference in training if you stop and resume training with save/load_model?

636

asked Dec 08 '17 14:12

0vbb

1 Answers

It seem your links don't point to the same lines anymore than they originally pointed to at the time of your question, so I don't know which lines you are referring to.

But the answer is yes, the entire state of the optimizer is saved along with the model. You can see this happening in save_model(). Also if you wish not to save the optimizer weights, you can do so by calling save_model(include_optimizer=False).

If you inspect the resulting *.h5 file, for example by means of h5dump | less, you can see those weights. (h5dump is part of h5utils.)

Therefore saving a model and loading it again later should make no difference in many common cases. However there are exceptions not related to the optimizer. One that comes to my mind right now is an LSTM(stateful=True) layer which I believe does not save the internal LSTM states when calling save_model(). There are possibly many more reasons why interrupting the training with save/load might not produce the exact same results as training without interruption. But investigating this maybe makes sense only in the context of concrete code.

answered Sep 20 '22 06:09

jlh

Related questions
                            
                                Recurrentshop and Keras: multi-dimensional RNN results in a dimensions mismatch error
                            
                                How to debug custom loss function in Keras?
                            
                                Correct way to get output of intermediate layer in Keras model?
                            
                                Truncated Backpropagation in keras with one sequence per batch
                            
                                Anaconda Prompt Stuck/Closing after Keras installation
                            
                                Keras: Custom layer without inputs
                            
                                Speed up the initial TensorFlow startup
                            
                                How to control memory while using Keras with tensorflow backend?
                            
                                keras model.fit_generator() several times slower than model.fit()
                            
                                How to use OpenCV functions in Keras Lambda Layer?
                            
                                Tensorflow: simultaneous prediction on GPU and CPU
                            
                                Run Identical model on multiple GPUs, but send different user data to each GPU
                            
                                How do I keep track of the time the CPU is used vs the GPUs for deep learning?
                            
                                Can't save/load model using keras.load_model - IndexError: list index out of range
                            
                                Hot to fix Tensorflow model not running in Eager mode with .fit()?
                            
                                Why is it that `input_shape` does not include the batch dimension when passed as an argument to the `Dense` layer?
                            
                                keras: how to predict classes in order?
                            
                                How can a neural network architecture be visualized with Keras?
                            
                                Difference Between Keras Input Layer and Tensorflow Placeholders
                            
                                Keras + Tensorflow strange results

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With