Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow & Keras can't load .ckpt save

So I am using the ModelCheckpoint callback to save the best epoch of a model I am training. It saves with no errors, but when I try to load it, I get the error:

2019-07-27 22:58:04.713951: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open C:\Users\Riley\PycharmProjects\myNN\cp.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

I have tried using the absolute/full path, but no luck. I'm sure I could use EarlyStopping, but I'd still like to understand why I am getting the error. Here is my code:

from __future__ import absolute_import, division, print_function

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import datetime
import statistics

(train_images, train_labels), (test_images, test_labels) = np.load("dataset.npy", allow_pickle=True)

train_images = train_images / 255
test_images = test_images / 255

train_labels = list(map(float, train_labels))
test_labels = list(map(float, test_labels))
train_labels = [i/10 for i in train_labels]
test_labels = [i/10 for i in test_labels]

'''
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(128, 128)),
    keras.layers.Dense(64, activation=tf.nn.relu),
    keras.layers.Dense(1)
  ])

'''

start_time = datetime.datetime.now()

model = keras.Sequential([
    keras.layers.Conv2D(32, kernel_size=(5, 5), strides=(1, 1), activation='relu', input_shape=(128, 128, 1)),
    keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
    keras.layers.Dropout(0.2),
    keras.layers.Conv2D(64, (5, 5), activation='relu'),
    keras.layers.MaxPooling2D(pool_size=(2, 2)),
    keras.layers.Dropout(0.2),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(1000, activation='relu'),
    keras.layers.Dense(1)

])

model.compile(loss='mean_absolute_error',
    optimizer=keras.optimizers.SGD(lr=0.01),
    metrics=['mean_absolute_error', 'mean_squared_error'])

train_images = train_images.reshape(328, 128, 128, 1)
test_images = test_images.reshape(82, 128, 128, 1)

model.fit(train_images, train_labels, epochs=100, callbacks=[keras.callbacks.ModelCheckpoint("cp.ckpt", monitor='mean_absolute_error', save_best_only=True, verbose=1)])

model.load_weights("cp.ckpt")

predictions = model.predict(test_images)

totalDifference = 0
for i in range(82):
    print("%s: %s" % (test_labels[i] * 10, predictions[i] * 10))
    totalDifference += abs(test_labels[i] - predictions[i])

avgDifference = totalDifference / 8.2

print("\n%s\n" % avgDifference)
print("Time Elapsed:")
print(datetime.datetime.now() - start_time)
like image 824
Riley Fitzpatrick Avatar asked Oct 19 '25 13:10

Riley Fitzpatrick


2 Answers

TLDR; you are saving whole model, while trying to load only weights, that's not how it works.

Explanation

Your model's fit:

model.fit(
    train_images,
    train_labels,
    epochs=100,
    callbacks=[
        keras.callbacks.ModelCheckpoint(
            "cp.ckpt", monitor="mean_absolute_error", save_best_only=True, verbose=1
        )
    ],
)

As save_weights=False by default in ModelCheckpoint, you are saving whole model to .ckpt.

BTW. File should be named .hdf5 or .hf5 as it's Hierarchical Data Format 5. As Windows is not extension-agnostic you may run into some problems if tensorflow / keras relies on extension on this OS.

On the other hand you are loading the model's weights only, while the file contains whole model:

model.load_weights("cp.ckpt")

Tensorflow's checkpointing (.cp) mechanism is different from Keras's (.hdf5), so watch out for that (there are plans to integrate them more closely, see here and here).

Solution

So, either use the callback as you currently do, BUT use model.load("model.hdf5") or add save_weights_only=True argument to ModelCheckpoint:

model.fit(
    train_images,
    train_labels,
    epochs=100,
    callbacks=[
        keras.callbacks.ModelCheckpoint(
            "weights.hdf5",
            monitor="mean_absolute_error",
            save_best_only=True,
            verbose=1,
            save_weights_only=True,  # Specify this
        )
    ],
)

and you can use your model.load_weights("weights.hdf5").

like image 111
Szymon Maszke Avatar answered Oct 21 '25 03:10

Szymon Maszke


model.load_weights will not work here. Reason is mentioned in the above answer. You can load weights by this code. Load your model first and than load weights. I hope this code will help you out

import tensorflow as tf

model=dense_net()
ckpt = tf.train.Checkpoint(
step=tf.Variable(1, dtype=tf.int64),  net=model)
ckpt.restore(tf.train.latest_checkpoint("/kaggle/working/training_1/cp.ckpt.data-00001-of-00002"))
like image 43
Sohaib Anwaar Avatar answered Oct 21 '25 03:10

Sohaib Anwaar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!