Tensorflow & Keras can't load .ckpt save

Question

So I am using the ModelCheckpoint callback to save the best epoch of a model I am training. It saves with no errors, but when I try to load it, I get the error:

2019-07-27 22:58:04.713951: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open C:\Users\Riley\PycharmProjects\myNN\cp.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

I have tried using the absolute/full path, but no luck. I'm sure I could use EarlyStopping, but I'd still like to understand why I am getting the error. Here is my code:

from __future__ import absolute_import, division, print_function

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import datetime
import statistics

(train_images, train_labels), (test_images, test_labels) = np.load("dataset.npy", allow_pickle=True)

train_images = train_images / 255
test_images = test_images / 255

train_labels = list(map(float, train_labels))
test_labels = list(map(float, test_labels))
train_labels = [i/10 for i in train_labels]
test_labels = [i/10 for i in test_labels]

'''
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(128, 128)),
    keras.layers.Dense(64, activation=tf.nn.relu),
    keras.layers.Dense(1)
  ])

'''

start_time = datetime.datetime.now()

model = keras.Sequential([
    keras.layers.Conv2D(32, kernel_size=(5, 5), strides=(1, 1), activation='relu', input_shape=(128, 128, 1)),
    keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
    keras.layers.Dropout(0.2),
    keras.layers.Conv2D(64, (5, 5), activation='relu'),
    keras.layers.MaxPooling2D(pool_size=(2, 2)),
    keras.layers.Dropout(0.2),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(1000, activation='relu'),
    keras.layers.Dense(1)

])

model.compile(loss='mean_absolute_error',
    optimizer=keras.optimizers.SGD(lr=0.01),
    metrics=['mean_absolute_error', 'mean_squared_error'])

train_images = train_images.reshape(328, 128, 128, 1)
test_images = test_images.reshape(82, 128, 128, 1)

model.fit(train_images, train_labels, epochs=100, callbacks=[keras.callbacks.ModelCheckpoint("cp.ckpt", monitor='mean_absolute_error', save_best_only=True, verbose=1)])

model.load_weights("cp.ckpt")

predictions = model.predict(test_images)

totalDifference = 0
for i in range(82):
    print("%s: %s" % (test_labels[i] * 10, predictions[i] * 10))
    totalDifference += abs(test_labels[i] - predictions[i])

avgDifference = totalDifference / 8.2

print("
%s
" % avgDifference)
print("Time Elapsed:")
print(datetime.datetime.now() - start_time)

Szymon Maszke · Accepted Answer

TLDR; you are saving whole model, while trying to load only weights, that's not how it works.

Explanation

Your model's fit:

model.fit(
    train_images,
    train_labels,
    epochs=100,
    callbacks=[
        keras.callbacks.ModelCheckpoint(
            "cp.ckpt", monitor="mean_absolute_error", save_best_only=True, verbose=1
        )
    ],
)

As save_weights=False by default in ModelCheckpoint, you are saving whole model to .ckpt.

BTW. File should be named .hdf5 or .hf5 as it's Hierarchical Data Format 5. As Windows is not extension-agnostic you may run into some problems if tensorflow / keras relies on extension on this OS.

On the other hand you are loading the model's weights only, while the file contains whole model:

model.load_weights("cp.ckpt")

Tensorflow's checkpointing (.cp) mechanism is different from Keras's (.hdf5), so watch out for that (there are plans to integrate them more closely, see here and here).

Solution

So, either use the callback as you currently do, BUT use model.load("model.hdf5") or add save_weights_only=True argument to ModelCheckpoint:

model.fit(
    train_images,
    train_labels,
    epochs=100,
    callbacks=[
        keras.callbacks.ModelCheckpoint(
            "weights.hdf5",
            monitor="mean_absolute_error",
            save_best_only=True,
            verbose=1,
            save_weights_only=True,  # Specify this
        )
    ],
)

and you can use your model.load_weights("weights.hdf5").

Sohaib Anwaar · Answer

model.load_weights will not work here. Reason is mentioned in the above answer. You can load weights by this code. Load your model first and than load weights. I hope this code will help you out

import tensorflow as tf

model=dense_net()
ckpt = tf.train.Checkpoint(
step=tf.Variable(1, dtype=tf.int64),  net=model)
ckpt.restore(tf.train.latest_checkpoint("/kaggle/working/training_1/cp.ckpt.data-00001-of-00002"))

Tensorflow & Keras can't load .ckpt save

Tags:

python

machine-learning

tensorflow

computer-vision

keras

Riley Fitzpatrick

2 Answers

Explanation

Solution

Szymon Maszke

Sohaib Anwaar

Recent Activity

Donate For Us

Tensorflow & Keras can't load .ckpt save

Tags:

python

machine-learning

tensorflow

computer-vision

keras

Riley Fitzpatrick

2 Answers

Explanation

Solution

Szymon Maszke

Sohaib Anwaar

Related questions

Recent Activity

Donate For Us