So I am using the ModelCheckpoint callback to save the best epoch of a model I am training. It saves with no errors, but when I try to load it, I get the error:
2019-07-27 22:58:04.713951: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open C:\Users\Riley\PycharmProjects\myNN\cp.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
I have tried using the absolute/full path, but no luck. I'm sure I could use EarlyStopping, but I'd still like to understand why I am getting the error. Here is my code:
from __future__ import absolute_import, division, print_function
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import datetime
import statistics
(train_images, train_labels), (test_images, test_labels) = np.load("dataset.npy", allow_pickle=True)
train_images = train_images / 255
test_images = test_images / 255
train_labels = list(map(float, train_labels))
test_labels = list(map(float, test_labels))
train_labels = [i/10 for i in train_labels]
test_labels = [i/10 for i in test_labels]
'''
model = keras.Sequential([
keras.layers.Flatten(input_shape=(128, 128)),
keras.layers.Dense(64, activation=tf.nn.relu),
keras.layers.Dense(1)
])
'''
start_time = datetime.datetime.now()
model = keras.Sequential([
keras.layers.Conv2D(32, kernel_size=(5, 5), strides=(1, 1), activation='relu', input_shape=(128, 128, 1)),
keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
keras.layers.Dropout(0.2),
keras.layers.Conv2D(64, (5, 5), activation='relu'),
keras.layers.MaxPooling2D(pool_size=(2, 2)),
keras.layers.Dropout(0.2),
keras.layers.Flatten(),
keras.layers.Dropout(0.5),
keras.layers.Dense(1000, activation='relu'),
keras.layers.Dense(1)
])
model.compile(loss='mean_absolute_error',
optimizer=keras.optimizers.SGD(lr=0.01),
metrics=['mean_absolute_error', 'mean_squared_error'])
train_images = train_images.reshape(328, 128, 128, 1)
test_images = test_images.reshape(82, 128, 128, 1)
model.fit(train_images, train_labels, epochs=100, callbacks=[keras.callbacks.ModelCheckpoint("cp.ckpt", monitor='mean_absolute_error', save_best_only=True, verbose=1)])
model.load_weights("cp.ckpt")
predictions = model.predict(test_images)
totalDifference = 0
for i in range(82):
print("%s: %s" % (test_labels[i] * 10, predictions[i] * 10))
totalDifference += abs(test_labels[i] - predictions[i])
avgDifference = totalDifference / 8.2
print("\n%s\n" % avgDifference)
print("Time Elapsed:")
print(datetime.datetime.now() - start_time)
TLDR; you are saving whole model, while trying to load only weights, that's not how it works.
Your model's fit
:
model.fit(
train_images,
train_labels,
epochs=100,
callbacks=[
keras.callbacks.ModelCheckpoint(
"cp.ckpt", monitor="mean_absolute_error", save_best_only=True, verbose=1
)
],
)
As save_weights=False
by default in ModelCheckpoint
, you are saving whole model to .ckpt
.
BTW. File should be named .hdf5
or .hf5
as it's Hierarchical Data Format 5
. As Windows is not extension-agnostic you may run into some problems if tensorflow
/ keras
relies on extension on this OS.
On the other hand you are loading the model's weights only, while the file contains whole model:
model.load_weights("cp.ckpt")
Tensorflow's checkpointing (.cp
) mechanism is different from Keras's (.hdf5
), so watch out for that (there are plans to integrate them more closely, see here and here).
So, either use the callback as you currently do, BUT use model.load("model.hdf5")
or add save_weights_only=True
argument to ModelCheckpoint
:
model.fit(
train_images,
train_labels,
epochs=100,
callbacks=[
keras.callbacks.ModelCheckpoint(
"weights.hdf5",
monitor="mean_absolute_error",
save_best_only=True,
verbose=1,
save_weights_only=True, # Specify this
)
],
)
and you can use your model.load_weights("weights.hdf5")
.
model.load_weights
will not work here. Reason is mentioned in the above answer.
You can load weights by this code. Load your model first and than load weights. I hope this code will help you out
import tensorflow as tf
model=dense_net()
ckpt = tf.train.Checkpoint(
step=tf.Variable(1, dtype=tf.int64), net=model)
ckpt.restore(tf.train.latest_checkpoint("/kaggle/working/training_1/cp.ckpt.data-00001-of-00002"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With