I have a model that I've trained for 40 epochs. I kept checkpoints for each epochs, and I have also saved the model with model.save()
. The code for training is:
n_units = 1000 model = Sequential() model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units, return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units)) model.add(Dropout(0.2)) model.add(Dense(vec_size, activation='linear')) model.compile(loss='mean_squared_error', optimizer='adam') # define the checkpoint filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] # fit the model model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)
However, when I load the model and try training it again, it starts all over as if it hasn't been trained before. The loss doesn't start from the last training.
What confuses me is when I load the model and redefine the model structure and use load_weight
, model.predict()
works well. Thus, I believe the model weights are loaded:
model = Sequential() model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units, return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units)) model.add(Dropout(0.2)) model.add(Dense(vec_size, activation='linear')) filename = "word2vec-39-0.0027.hdf5" model.load_weights(filename) model.compile(loss='mean_squared_error', optimizer='adam')
However, When I continue training with this, the loss is as high as the initial stage:
filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] # fit the model model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)
I searched and found some examples of saving and loading models here and here. However, none of them work.
Update 1
I looked at this question, tried it and it works:
model.save('partly_trained.h5') del model load_model('partly_trained.h5')
But when I close Python and reopen it, then run load_model
again, it fails. The loss is as high as the initial state.
Update 2
I tried Yu-Yang's example code and it works. However, when I use my code again, it still failed.
This is result form the original training. The second epoch should start with loss = 3.1***:
13700/13846 [============================>.] - ETA: 0s - loss: 3.0519 13750/13846 [============================>.] - ETA: 0s - loss: 3.0511 13800/13846 [============================>.] - ETA: 0s - loss: 3.0512Epoch 00000: loss improved from inf to 3.05101, saving model to LPT-00-3.0510.h5 13846/13846 [==============================] - 81s - loss: 3.0510 Epoch 2/60 50/13846 [..............................] - ETA: 80s - loss: 3.1754 100/13846 [..............................] - ETA: 78s - loss: 3.1174 150/13846 [..............................] - ETA: 78s - loss: 3.0745
I closed Python, reopened it, loaded the model with model = load_model("LPT-00-3.0510.h5")
then train with:
filepath="LPT-{epoch:02d}-{loss:.4f}.h5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] # fit the model model.fit(x, y, epochs=60, batch_size=50, callbacks=callbacks_list)
The loss starts with 4.54:
Epoch 1/60 50/13846 [..............................] - ETA: 162s - loss: 4.5451 100/13846 [..............................] - ETA: 113s - loss: 4.3835
Callback to save the Keras model or model weights at some frequency. ModelCheckpoint callback is used in conjunction with training using model. fit() to save a model or weights (in a checkpoint file) at some interval, so the model or weights can be loaded later to continue the training from the state saved.
Another way of saving models is to call the save() method on the model. This will create an HDF5 formatted file. The save method saves additional data, like the model's configuration and even the state of the optimizer. A model that was saved using the save() method can be loaded with the function keras.
As it's quite difficult to clarify where the problem is, I created a toy example from your code, and it seems to work alright.
import numpy as np from numpy.testing import assert_allclose from keras.models import Sequential, load_model from keras.layers import LSTM, Dropout, Dense from keras.callbacks import ModelCheckpoint vec_size = 100 n_units = 10 x_train = np.random.rand(500, 10, vec_size) y_train = np.random.rand(500, vec_size) model = Sequential() model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units, return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(n_units)) model.add(Dropout(0.2)) model.add(Dense(vec_size, activation='linear')) model.compile(loss='mean_squared_error', optimizer='adam') # define the checkpoint filepath = "model.h5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] # fit the model model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list) # load the model new_model = load_model(filepath) assert_allclose(model.predict(x_train), new_model.predict(x_train), 1e-5) # fit the model checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] new_model.fit(x_train, y_train, epochs=5, batch_size=50, callbacks=callbacks_list)
The loss continues to decrease after model loading. (restarting python also gives no problem)
Using TensorFlow backend. Epoch 1/5 500/500 [==============================] - 2s - loss: 0.3216 Epoch 00000: loss improved from inf to 0.32163, saving model to model.h5 Epoch 2/5 500/500 [==============================] - 0s - loss: 0.2923 Epoch 00001: loss improved from 0.32163 to 0.29234, saving model to model.h5 Epoch 3/5 500/500 [==============================] - 0s - loss: 0.2542 Epoch 00002: loss improved from 0.29234 to 0.25415, saving model to model.h5 Epoch 4/5 500/500 [==============================] - 0s - loss: 0.2086 Epoch 00003: loss improved from 0.25415 to 0.20860, saving model to model.h5 Epoch 5/5 500/500 [==============================] - 0s - loss: 0.1725 Epoch 00004: loss improved from 0.20860 to 0.17249, saving model to model.h5 Epoch 1/5 500/500 [==============================] - 0s - loss: 0.1454 Epoch 00000: loss improved from inf to 0.14543, saving model to model.h5 Epoch 2/5 500/500 [==============================] - 0s - loss: 0.1289 Epoch 00001: loss improved from 0.14543 to 0.12892, saving model to model.h5 Epoch 3/5 500/500 [==============================] - 0s - loss: 0.1169 Epoch 00002: loss improved from 0.12892 to 0.11694, saving model to model.h5 Epoch 4/5 500/500 [==============================] - 0s - loss: 0.1097 Epoch 00003: loss improved from 0.11694 to 0.10971, saving model to model.h5 Epoch 5/5 500/500 [==============================] - 0s - loss: 0.1057 Epoch 00004: loss improved from 0.10971 to 0.10570, saving model to model.h5
BTW, redefining the model followed by load_weight()
definitely won't work, because save_weight()
and load_weight()
does not save/load the optimizer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With