I recently trained a binary image classifier and ended up with a model which was around 97.8% accurate. I created this classifier by following a couple of official Tensorflow guides, namely:
I noticed while training (on a GTX 1080) that each epoch was taking around 30 seconds to run. Further reading revealed that a better way to load data into a Tensorflow training run is by using a Dataset. So I updated my code to load the images into a dataset and then have them read by the model.fit_generator
method.
Now when I perform my training I find that my accuracy and loss metrics are static - even with the learning rate changing automatically over time. The output looks something like this:
loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Given that I'm training a binary classifier an accuracy of 50% is the same as guessing, so I'm wondering if there's a problem with the way I'm providing the images, or perhaps with the size of the dataset.
My image data is split like this:
training/
true/ (366 images)
false/ (354 images)
validation/
true/ (175 images)
false/ (885 images)
I was using ImageDataGenerator
before with various mutations being performed, therefore increasing the overall dataset. Is my problem with the size of my dataset?
The application code I'm using is as follows:
import math
import tensorflow as tf
import os
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.callbacks import EarlyStopping
import helpers
import settings
AUTOTUNE = tf.data.experimental.AUTOTUNE
assert tf.test.is_built_with_cuda()
assert tf.test.is_gpu_available()
# Collect the list of training files and process their paths.
training_dataset_files = tf.data.Dataset.list_files(os.path.join(settings.TRAINING_DIRECTORY, '*', '*.png'))
training_dataset_labelled = training_dataset_files.map(helpers.process_path, num_parallel_calls=AUTOTUNE)
training_dataset = helpers.prepare_for_training(training_dataset_labelled)
# Collect the validation files.
validation_dataset_files = tf.data.Dataset.list_files(os.path.join(settings.VALIDATION_DIRECTORY, '*', '*.png'))
validation_dataset_labelled = validation_dataset_files.map(helpers.process_path, num_parallel_calls=AUTOTUNE)
validation_dataset = helpers.prepare_for_training(validation_dataset_labelled)
model = tf.keras.models.Sequential([
# This is the first convolution
tf.keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=(settings.TARGET_IMAGE_HEIGHT, settings.TARGET_IMAGE_WIDTH, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
# The second convolution
tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
# The third convolution
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
# The fourth convolution
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
# The fifth convolution
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
# Flatten the results to feed into a DNN
tf.keras.layers.Flatten(),
# 512 neuron hidden layer
tf.keras.layers.Dense(512, activation='relu'),
# Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('false') and 1 for the other ('true')
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.summary()
model.compile(
loss='binary_crossentropy',
optimizer=RMSprop(lr=0.1),
metrics=['acc']
)
callbacks = [
# EarlyStopping(patience=4),
tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_acc',
patience=2,
verbose=1,
factor=0.5,
min_lr=0.00001
),
tf.keras.callbacks.ModelCheckpoint(
# Path where to save the model
filepath=settings.CHECKPOINT_FILE,
# The two parameters below mean that we will overwrite
# the current checkpoint if and only if
# the `val_loss` score has improved.
save_best_only=True,
monitor='val_loss',
verbose=1
),
tf.keras.callbacks.TensorBoard(
log_dir=settings.LOG_DIRECTORY,
histogram_freq=1
)
]
training_dataset_length = tf.data.experimental.cardinality(training_dataset_files).numpy()
steps_per_epoch = math.ceil(training_dataset_length // settings.TRAINING_BATCH_SIZE)
validation_dataset_length = tf.data.experimental.cardinality(validation_dataset_files).numpy()
validation_steps = math.ceil(validation_dataset_length // settings.VALIDATION_BATCH_SIZE)
history = model.fit_generator(
training_dataset,
steps_per_epoch=steps_per_epoch,
epochs=20000,
verbose=1,
validation_data=validation_dataset,
validation_steps=validation_steps,
callbacks=callbacks,
)
model.save(settings.FULL_MODEL_FILE)
With helpers.py
looking like this:
import tensorflow as tf
import settings
AUTOTUNE = tf.data.experimental.AUTOTUNE
def process_path(file_path):
parts = tf.strings.split(file_path, '\\')
label = parts[-2] == settings.CLASS_NAMES
# Read the file and decode the image
img = tf.io.read_file(file_path)
img = tf.image.decode_png(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize(img, [settings.TARGET_IMAGE_HEIGHT, settings.TARGET_IMAGE_WIDTH])
return img, label
def prepare_for_training(ds, cache=True, shuffle_buffer_size=10000):
if cache:
if isinstance(cache, str):
ds = ds.cache(cache)
else:
ds = ds.cache()
ds = ds.shuffle(buffer_size=shuffle_buffer_size)
ds = ds.repeat()
ds = ds.batch(settings.TRAINING_BATCH_SIZE)
ds = ds.prefetch(buffer_size=AUTOTUNE)
return ds
A larger snippet of application output is as follows:
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00207: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 247ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 208/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00208: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 248ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 209/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00209: val_loss did not improve from 7.71247
22/22 [==============================] - 6s 251ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 210/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00210: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 242ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 211/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00211: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 246ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 212/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00212: val_loss did not improve from 7.71247
22/22 [==============================] - 6s 252ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 213/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00213: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 242ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 214/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00214: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 241ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 215/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00215: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 247ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 216/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00216: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 248ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 217/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00217: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 249ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 218/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00218: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 244ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 219/20000
19/22 [========================>.....] - ETA: 0s - loss: 7.7125 - acc: 0.5000
The size of validation set may be too small, such that small changes in the output causes large fluctuations in the validation error.
Too Much DataHaving more data certainly increases the accuracy of your model, but there comes a stage where even adding infinite amounts of data cannot improve any more accuracy. This is what we called the natural noise of the data.
If the training accuracy of your model is low, it's an indication that your current model configuration can't capture the complexity of your data. Try adjusting the training parameters.
There are somethings you must check.
x, y
pair from the dataset and verify that y
is within 0 and 1 (because you're using 'sigmoid'). These two are the most troublesome and probable things.
Later you might want to check whether x
from the dataset is within the same range you trained before (not crucial, but might change a little the performance), if the number of channels are the same, etc.
For the relus, there are solutions like this one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With