I have a model that I've trained for 40 epochs. I kept checkpoints for each epochs, and I have also saved the model with <code>model.save()</code>. The code for training is: <pre class="prettyprint"><code>n_units = 1000 model = Sequential() model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units, return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units)) model.add(Dropout(0.2)) model.add(Dense(vec_size, activation='linear')) model.compile(loss='mean_squared_error', optimizer='adam') # define the checkpoint filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] # fit the model model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list) </code></pre> However, when I load the model and try training it again, it starts all over as if it hasn't been trained before. The loss doesn't start from the last training. What confuses me is when I load the model and redefine the model structure and use <code>load_weight</code>, <code>model.predict()</code> works well. Thus, I believe the model weights are loaded: <pre class="prettyprint"><code>model = Sequential() model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units, return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units)) model.add(Dropout(0.2)) model.add(Dense(vec_size, activation='linear')) filename = "word2vec-39-0.0027.hdf5" model.load_weights(filename) model.compile(loss='mean_squared_error', optimizer='adam') </code></pre> However, When I continue training with this, the loss is as high as the initial stage: <pre class="prettyprint"><code>filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] # fit the model model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list) </code></pre> I searched and found some examples of saving and loading models here and here. However, none of them work. <hr> Update 1 I looked at this question, tried it and it works: <pre class="prettyprint"><code>model.save('partly_trained.h5') del model load_model('partly_trained.h5') </code></pre> But when I close Python and reopen it, then run <code>load_model</code> again, it fails. The loss is as high as the initial state. <hr> Update 2 I tried Yu-Yang's example code and it works. However, when I use my code again, it still failed. This is result form the original training. The second epoch should start with loss = 3.1***: <pre class="prettyprint"><code>13700/13846 [============================>.] - ETA: 0s - loss: 3.0519 13750/13846 [============================>.] - ETA: 0s - loss: 3.0511 13800/13846 [============================>.] - ETA: 0s - loss: 3.0512Epoch 00000: loss improved from inf to 3.05101, saving model to LPT-00-3.0510.h5 13846/13846 [==============================] - 81s - loss: 3.0510 Epoch 2/60 50/13846 [..............................] - ETA: 80s - loss: 3.1754 100/13846 [..............................] - ETA: 78s - loss: 3.1174 150/13846 [..............................] - ETA: 78s - loss: 3.0745 </code></pre> I closed Python, reopened it, loaded the model with <code>model = load_model("LPT-00-3.0510.h5")</code> then train with: <pre class="prettyprint"><code>filepath="LPT-{epoch:02d}-{loss:.4f}.h5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] # fit the model model.fit(x, y, epochs=60, batch_size=50, callbacks=callbacks_list) </code></pre> The loss starts with 4.54: <pre class="prettyprint"><code>Epoch 1/60 50/13846 [..............................] - ETA: 162s - loss: 4.5451 100/13846 [..............................] - ETA: 113s - loss: 4.3835 </code></pre>

As it's quite difficult to clarify where the problem is, I created a toy example from your code, and it seems to work alright. <pre class="prettyprint"><code>import numpy as np from numpy.testing import assert_allclose from keras.models import Sequential, load_model from keras.layers import LSTM, Dropout, Dense from keras.callbacks import ModelCheckpoint vec_size = 100 n_units = 10 x_train = np.random.rand(500, 10, vec_size) y_train = np.random.rand(500, vec_size) model = Sequential() model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units, return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units)) model.add(Dropout(0.2)) model.add(Dense(vec_size, activation='linear')) model.compile(loss='mean_squared_error', optimizer='adam') # define the checkpoint filepath = "model.h5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] # fit the model model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list) # load the model new_model = load_model(filepath) assert_allclose(model.predict(x_train), new_model.predict(x_train), 1e-5) # fit the model checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] new_model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list) </code></pre> The loss continues to decrease after model loading. (restarting python also gives no problem) <pre class="prettyprint"><code>Using TensorFlow backend. Epoch 1/5 500/500 [==============================] - 2s - loss: 0.3216 Epoch 00000: loss improved from inf to 0.32163, saving model to model.h5 Epoch 2/5 500/500 [==============================] - 0s - loss: 0.2923 Epoch 00001: loss improved from 0.32163 to 0.29234, saving model to model.h5 Epoch 3/5 500/500 [==============================] - 0s - loss: 0.2542 Epoch 00002: loss improved from 0.29234 to 0.25415, saving model to model.h5 Epoch 4/5 500/500 [==============================] - 0s - loss: 0.2086 Epoch 00003: loss improved from 0.25415 to 0.20860, saving model to model.h5 Epoch 5/5 500/500 [==============================] - 0s - loss: 0.1725 Epoch 00004: loss improved from 0.20860 to 0.17249, saving model to model.h5 Epoch 1/5 500/500 [==============================] - 0s - loss: 0.1454 Epoch 00000: loss improved from inf to 0.14543, saving model to model.h5 Epoch 2/5 500/500 [==============================] - 0s - loss: 0.1289 Epoch 00001: loss improved from 0.14543 to 0.12892, saving model to model.h5 Epoch 3/5 500/500 [==============================] - 0s - loss: 0.1169 Epoch 00002: loss improved from 0.12892 to 0.11694, saving model to model.h5 Epoch 4/5 500/500 [==============================] - 0s - loss: 0.1097 Epoch 00003: loss improved from 0.11694 to 0.10971, saving model to model.h5 Epoch 5/5 500/500 [==============================] - 0s - loss: 0.1057 Epoch 00004: loss improved from 0.10971 to 0.10570, saving model to model.h5 </code></pre> BTW, redefining the model followed by <code>load_weight()</code> definitely won't work, because <code>save_weight()</code> and <code>load_weight()</code> does not save/load the optimizer.

Keras: How to save model and continue training?

I have a model that I've trained for 40 epochs. I kept checkpoints for each epochs, and I have also saved the model with model.save(). The code for training is:

n_units = 1000 model = Sequential() model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units, return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units)) model.add(Dropout(0.2)) model.add(Dense(vec_size, activation='linear')) model.compile(loss='mean_squared_error', optimizer='adam') # define the checkpoint filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] # fit the model model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

However, when I load the model and try training it again, it starts all over as if it hasn't been trained before. The loss doesn't start from the last training.

What confuses me is when I load the model and redefine the model structure and use load_weight, model.predict() works well. Thus, I believe the model weights are loaded:

model = Sequential() model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units, return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units)) model.add(Dropout(0.2)) model.add(Dense(vec_size, activation='linear')) filename = "word2vec-39-0.0027.hdf5" model.load_weights(filename) model.compile(loss='mean_squared_error', optimizer='adam')

However, When I continue training with this, the loss is as high as the initial stage:

filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] # fit the model model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

I searched and found some examples of saving and loading models here and here. However, none of them work.

Update 1

I looked at this question, tried it and it works:

model.save('partly_trained.h5') del model load_model('partly_trained.h5')

But when I close Python and reopen it, then run load_model again, it fails. The loss is as high as the initial state.

Update 2

I tried Yu-Yang's example code and it works. However, when I use my code again, it still failed.

This is result form the original training. The second epoch should start with loss = 3.1***:

13700/13846 [============================>.] - ETA: 0s - loss: 3.0519 13750/13846 [============================>.] - ETA: 0s - loss: 3.0511 13800/13846 [============================>.] - ETA: 0s - loss: 3.0512Epoch 00000: loss improved from inf to 3.05101, saving model to LPT-00-3.0510.h5  13846/13846 [==============================] - 81s - loss: 3.0510     Epoch 2/60     50/13846 [..............................] - ETA: 80s - loss: 3.1754   100/13846 [..............................] - ETA: 78s - loss: 3.1174   150/13846 [..............................] - ETA: 78s - loss: 3.0745

I closed Python, reopened it, loaded the model with model = load_model("LPT-00-3.0510.h5") then train with:

filepath="LPT-{epoch:02d}-{loss:.4f}.h5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] # fit the model model.fit(x, y, epochs=60, batch_size=50, callbacks=callbacks_list)

The loss starts with 4.54:

Epoch 1/60    50/13846 [..............................] - ETA: 162s - loss: 4.5451    100/13846 [..............................] - ETA: 113s - loss: 4.3835

How do I save my model in training keras?

Callback to save the Keras model or model weights at some frequency. ModelCheckpoint callback is used in conjunction with training using model. fit() to save a model or weights (in a checkpoint file) at some interval, so the model or weights can be loaded later to continue the training from the state saved.

How do you save load model and continue training using the HDF5 file in keras?

Another way of saving models is to call the save() method on the model. This will create an HDF5 formatted file. The save method saves additional data, like the model's configuration and even the state of the optimizer. A model that was saved using the save() method can be loaded with the function keras.

As it's quite difficult to clarify where the problem is, I created a toy example from your code, and it seems to work alright.

import numpy as np from numpy.testing import assert_allclose from keras.models import Sequential, load_model from keras.layers import LSTM, Dropout, Dense from keras.callbacks import ModelCheckpoint  vec_size = 100 n_units = 10  x_train = np.random.rand(500, 10, vec_size) y_train = np.random.rand(500, vec_size)  model = Sequential() model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units, return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units)) model.add(Dropout(0.2)) model.add(Dense(vec_size, activation='linear')) model.compile(loss='mean_squared_error', optimizer='adam')  # define the checkpoint filepath = "model.h5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint]  # fit the model model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list)  # load the model new_model = load_model(filepath) assert_allclose(model.predict(x_train),                 new_model.predict(x_train),                 1e-5)  # fit the model checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] new_model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list)

The loss continues to decrease after model loading. (restarting python also gives no problem)

Using TensorFlow backend. Epoch 1/5 500/500 [==============================] - 2s - loss: 0.3216     Epoch 00000: loss improved from inf to 0.32163, saving model to model.h5 Epoch 2/5 500/500 [==============================] - 0s - loss: 0.2923     Epoch 00001: loss improved from 0.32163 to 0.29234, saving model to model.h5 Epoch 3/5 500/500 [==============================] - 0s - loss: 0.2542     Epoch 00002: loss improved from 0.29234 to 0.25415, saving model to model.h5 Epoch 4/5 500/500 [==============================] - 0s - loss: 0.2086     Epoch 00003: loss improved from 0.25415 to 0.20860, saving model to model.h5 Epoch 5/5 500/500 [==============================] - 0s - loss: 0.1725     Epoch 00004: loss improved from 0.20860 to 0.17249, saving model to model.h5  Epoch 1/5 500/500 [==============================] - 0s - loss: 0.1454     Epoch 00000: loss improved from inf to 0.14543, saving model to model.h5 Epoch 2/5 500/500 [==============================] - 0s - loss: 0.1289     Epoch 00001: loss improved from 0.14543 to 0.12892, saving model to model.h5 Epoch 3/5 500/500 [==============================] - 0s - loss: 0.1169     Epoch 00002: loss improved from 0.12892 to 0.11694, saving model to model.h5 Epoch 4/5 500/500 [==============================] - 0s - loss: 0.1097     Epoch 00003: loss improved from 0.11694 to 0.10971, saving model to model.h5 Epoch 5/5 500/500 [==============================] - 0s - loss: 0.1057     Epoch 00004: loss improved from 0.10971 to 0.10570, saving model to model.h5

BTW, redefining the model followed by load_weight() definitely won't work, because save_weight() and load_weight() does not save/load the optimizer.

Keras: How to save model and continue training?

Tags:

python

keras

David

People also ask

1 Answers

Yu-Yang

Recent Activity

Donate For Us

Keras: How to save model and continue training?

Tags:

python

keras

David

People also ask

1 Answers

Yu-Yang

Related questions

Recent Activity

Donate For Us