Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accuracy no longer improving after switching to Dataset

I recently trained a binary image classifier and ended up with a model which was around 97.8% accurate. I created this classifier by following a couple of official Tensorflow guides, namely:

  • https://www.tensorflow.org/tutorials/images/classification
  • https://www.tensorflow.org/tutorials/load_data/images

I noticed while training (on a GTX 1080) that each epoch was taking around 30 seconds to run. Further reading revealed that a better way to load data into a Tensorflow training run is by using a Dataset. So I updated my code to load the images into a dataset and then have them read by the model.fit_generator method.

Now when I perform my training I find that my accuracy and loss metrics are static - even with the learning rate changing automatically over time. The output looks something like this:

loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000

Given that I'm training a binary classifier an accuracy of 50% is the same as guessing, so I'm wondering if there's a problem with the way I'm providing the images, or perhaps with the size of the dataset.

My image data is split like this:

training/
        true/  (366 images)
        false/ (354 images)

validation/
        true/  (175 images)
        false/ (885 images)

I was using ImageDataGenerator before with various mutations being performed, therefore increasing the overall dataset. Is my problem with the size of my dataset?

The application code I'm using is as follows:

import math

import tensorflow as tf
import os

from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.callbacks import EarlyStopping

import helpers
import settings

AUTOTUNE = tf.data.experimental.AUTOTUNE

assert tf.test.is_built_with_cuda()
assert tf.test.is_gpu_available()

# Collect the list of training files and process their paths.
training_dataset_files = tf.data.Dataset.list_files(os.path.join(settings.TRAINING_DIRECTORY, '*', '*.png'))
training_dataset_labelled = training_dataset_files.map(helpers.process_path, num_parallel_calls=AUTOTUNE)
training_dataset = helpers.prepare_for_training(training_dataset_labelled)

# Collect the validation files.
validation_dataset_files = tf.data.Dataset.list_files(os.path.join(settings.VALIDATION_DIRECTORY, '*', '*.png'))
validation_dataset_labelled = validation_dataset_files.map(helpers.process_path, num_parallel_calls=AUTOTUNE)
validation_dataset = helpers.prepare_for_training(validation_dataset_labelled)

model = tf.keras.models.Sequential([
    # This is the first convolution
    tf.keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=(settings.TARGET_IMAGE_HEIGHT, settings.TARGET_IMAGE_WIDTH, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    # The second convolution
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    # The third convolution
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    # The fourth convolution
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    # The fifth convolution
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    # Flatten the results to feed into a DNN
    tf.keras.layers.Flatten(),
    # 512 neuron hidden layer
    tf.keras.layers.Dense(512, activation='relu'),
    # Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('false') and 1 for the other ('true')
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.summary()

model.compile(
    loss='binary_crossentropy',
    optimizer=RMSprop(lr=0.1),
    metrics=['acc']
)

callbacks = [
    # EarlyStopping(patience=4),
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_acc',
        patience=2,
        verbose=1,
        factor=0.5,
        min_lr=0.00001
    ),
    tf.keras.callbacks.ModelCheckpoint(
        # Path where to save the model
        filepath=settings.CHECKPOINT_FILE,
        # The two parameters below mean that we will overwrite
        # the current checkpoint if and only if
        # the `val_loss` score has improved.
        save_best_only=True,
        monitor='val_loss',
        verbose=1
    ),
    tf.keras.callbacks.TensorBoard(
        log_dir=settings.LOG_DIRECTORY,
        histogram_freq=1
    )
]

training_dataset_length = tf.data.experimental.cardinality(training_dataset_files).numpy()
steps_per_epoch = math.ceil(training_dataset_length // settings.TRAINING_BATCH_SIZE)

validation_dataset_length = tf.data.experimental.cardinality(validation_dataset_files).numpy()
validation_steps = math.ceil(validation_dataset_length // settings.VALIDATION_BATCH_SIZE)

history = model.fit_generator(
    training_dataset,
    steps_per_epoch=steps_per_epoch,
    epochs=20000,
    verbose=1,
    validation_data=validation_dataset,
    validation_steps=validation_steps,
    callbacks=callbacks,
)

model.save(settings.FULL_MODEL_FILE)

With helpers.py looking like this:

import tensorflow as tf
import settings

AUTOTUNE = tf.data.experimental.AUTOTUNE


def process_path(file_path):
    parts = tf.strings.split(file_path, '\\')
    label = parts[-2] == settings.CLASS_NAMES

    # Read the file and decode the image
    img = tf.io.read_file(file_path)
    img = tf.image.decode_png(img, channels=3)
    img = tf.image.convert_image_dtype(img, tf.float32)
    img = tf.image.resize(img, [settings.TARGET_IMAGE_HEIGHT, settings.TARGET_IMAGE_WIDTH])
    return img, label


def prepare_for_training(ds, cache=True, shuffle_buffer_size=10000):
    if cache:
        if isinstance(cache, str):
            ds = ds.cache(cache)
        else:
            ds = ds.cache()

    ds = ds.shuffle(buffer_size=shuffle_buffer_size)

    ds = ds.repeat()
    ds = ds.batch(settings.TRAINING_BATCH_SIZE)
    ds = ds.prefetch(buffer_size=AUTOTUNE)

    return ds

A larger snippet of application output is as follows:

21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00207: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 247ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 208/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00208: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 248ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 209/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00209: val_loss did not improve from 7.71247
22/22 [==============================] - 6s 251ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 210/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00210: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 242ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 211/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00211: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 246ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 212/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00212: val_loss did not improve from 7.71247
22/22 [==============================] - 6s 252ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 213/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00213: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 242ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 214/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00214: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 241ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 215/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00215: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 247ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 216/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00216: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 248ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 217/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00217: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 249ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 218/20000
21/22 [===========================>..] - ETA: 0s - loss: 7.7125 - acc: 0.5000
Epoch 00218: val_loss did not improve from 7.71247
22/22 [==============================] - 5s 244ms/step - loss: 7.7125 - acc: 0.5000 - val_loss: 7.7125 - val_acc: 0.5000
Epoch 219/20000
19/22 [========================>.....] - ETA: 0s - loss: 7.7125 - acc: 0.5000
like image 428
Daniel Samuels Avatar asked Nov 06 '19 22:11

Daniel Samuels


People also ask

Why is my validation accuracy fluctuating?

The size of validation set may be too small, such that small changes in the output causes large fluctuations in the validation error.

Does increasing the amount of data generally mean an increase in accuracy?

Too Much DataHaving more data certainly increases the accuracy of your model, but there comes a stage where even adding infinite amounts of data cannot improve any more accuracy. This is what we called the natural noise of the data.

What do you do if training accuracy is low?

If the training accuracy of your model is low, it's an indication that your current model configuration can't capture the complexity of your data. Try adjusting the training parameters.


1 Answers

There are somethings you must check.

  • Try a few more times, you may have been unlucky with the 'relu' activations (if one layer goes to all zeros, you're stuck forever).
  • Take a x, y pair from the dataset and verify that y is within 0 and 1 (because you're using 'sigmoid').

These two are the most troublesome and probable things.

Later you might want to check whether x from the dataset is within the same range you trained before (not crucial, but might change a little the performance), if the number of channels are the same, etc.


For the relus, there are solutions like this one.

like image 177
Daniel Möller Avatar answered Oct 11 '22 03:10

Daniel Möller