Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loss is NaN on image classification task

I'm trying to train a basic CNN on the image dataset that contains faces of celebrities with the class assigned corresponding to each person. Given that there are about 10,000 classes I used sparse_categorical_crossentropy rather than one-hot encoding the classes, however as soon as the network starts training the loss is stuck at one number and after several batches is goes to NaN I tried different scaling of the images and a smaller network but with no luck. Any clues on what might be causing the NaN?

Function that generates batches:

def Generator(data, label, batch_size):
    url = "../input/celeba-dataset/img_align_celeba/img_align_celeba/"
    INPUT_SHAPE = (109, 109)
    i = 0
    while True:
        image_batch = [ ]
        label_batch = [ ]
        for b in range(batch_size):
            if i == len(data):
                i = 0
                data, label = shuffle(data, label)
            sample = data[i]
            label_batch.append(label[i])
            i += 1
            image = cv2.resize(cv2.imread(url + sample), INPUT_SHAPE)
            image_batch.append((image.astype(float)) / 255)

        yield (np.array(image_batch), np.array(label_batch))

The model:

class CNN():

def __init__(self, train, val, y_train, y_val, batch_size):
    ## Load the batch generator
    self.train_batch_gen = Generator(train, y_train, batch_size)
    self.val_batch_gen = Generator(val, y_val, batch_size)

    self.input_shape = (109, 109, 3)
    self.num_classes = len(np.unique(y_train))
    self.len_train = len(train)
    self.len_val = len(val)

    self.batch_size = batch_size
    self.model = self.buildModel()

def buildModel(self):

    model = models.Sequential()
    model.add(layers.Conv2D(32, (3, 3), activation='relu', padding="same", input_shape=self.input_shape))
    model.add(layers.Conv2D(64, (3, 3), activation='relu', padding="same", input_shape=self.input_shape))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(64, (3, 3), activation='relu', padding="same"))
    model.add(layers.Conv2D(128, (3, 3), activation='relu', padding="same"))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(96, (3, 3), activation='relu', padding="same"))
    model.add(layers.Conv2D(192, (3, 3), activation='relu', padding="same"))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(128, (3, 3), activation='relu', padding="same"))
    model.add(layers.Conv2D(256, (3, 3), activation='relu', padding="same"))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(160, (3, 3), activation='relu', padding="same"))
    model.add(layers.Conv2D(320, (3, 3), activation='relu', padding="same"))
    model.add(layers.AveragePooling2D(pool_size=(4, 4)))
    model.add(layers.Flatten())
    model.add(layers.Dense(128, activation='tanh'))
    model.add(layers.Dropout(rate=0.1))
    model.add(layers.Dense(self.num_classes, activation = "softmax")) #Classification layer or output layer
    opt = tf.keras.optimizers.Adam(learning_rate=0.00001)
    model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

    return model

def trainModel(self, epochs):

    self.model.fit_generator(generator=self.train_batch_gen,
                            steps_per_epoch = int(self.len_train // self.batch_size),
                            epochs=epochs,
                            validation_data = self.val_batch_gen,
                            validation_steps = int(self.len_val // self.batch_size))
like image 263
Will Avatar asked Jul 24 '19 12:07

Will


People also ask

Why is my validation loss NaN?

If the number\values are not properly represented or in case if you have any space at the beginning of the value the system recognizes thats nan.

What is NaN in deep learning?

What are NaN values? NaN or Not a Number are special values in DataFrame and numpy arrays that represent the missing of value in a cell. In programming languages they are also represented, for example in Python they are represented as None value.


1 Answers

In my case, I used sparse_categorical_crossentropy with labels numbered from [1,2,3] (3 classes). In this case it produced NaNs from the start.

When I changed the labels from [1,2,3] to [0,1,2] the problem has disappeared.

like image 89
Timbus Calin Avatar answered Oct 23 '22 17:10

Timbus Calin