Loss is NaN on image classification task

Tags:

I'm trying to train a basic CNN on the image dataset that contains faces of celebrities with the class assigned corresponding to each person. Given that there are about 10,000 classes I used sparse_categorical_crossentropy rather than one-hot encoding the classes, however as soon as the network starts training the loss is stuck at one number and after several batches is goes to NaN I tried different scaling of the images and a smaller network but with no luck. Any clues on what might be causing the NaN?

Function that generates batches:

def Generator(data, label, batch_size):
    url = "../input/celeba-dataset/img_align_celeba/img_align_celeba/"
    INPUT_SHAPE = (109, 109)
    i = 0
    while True:
        image_batch = [ ]
        label_batch = [ ]
        for b in range(batch_size):
            if i == len(data):
                i = 0
                data, label = shuffle(data, label)
            sample = data[i]
            label_batch.append(label[i])
            i += 1
            image = cv2.resize(cv2.imread(url + sample), INPUT_SHAPE)
            image_batch.append((image.astype(float)) / 255)

        yield (np.array(image_batch), np.array(label_batch))

The model:

class CNN():

def __init__(self, train, val, y_train, y_val, batch_size):
    ## Load the batch generator
    self.train_batch_gen = Generator(train, y_train, batch_size)
    self.val_batch_gen = Generator(val, y_val, batch_size)

    self.input_shape = (109, 109, 3)
    self.num_classes = len(np.unique(y_train))
    self.len_train = len(train)
    self.len_val = len(val)

    self.batch_size = batch_size
    self.model = self.buildModel()

def buildModel(self):

    model = models.Sequential()
    model.add(layers.Conv2D(32, (3, 3), activation='relu', padding="same", input_shape=self.input_shape))
    model.add(layers.Conv2D(64, (3, 3), activation='relu', padding="same", input_shape=self.input_shape))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(64, (3, 3), activation='relu', padding="same"))
    model.add(layers.Conv2D(128, (3, 3), activation='relu', padding="same"))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(96, (3, 3), activation='relu', padding="same"))
    model.add(layers.Conv2D(192, (3, 3), activation='relu', padding="same"))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(128, (3, 3), activation='relu', padding="same"))
    model.add(layers.Conv2D(256, (3, 3), activation='relu', padding="same"))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(160, (3, 3), activation='relu', padding="same"))
    model.add(layers.Conv2D(320, (3, 3), activation='relu', padding="same"))
    model.add(layers.AveragePooling2D(pool_size=(4, 4)))
    model.add(layers.Flatten())
    model.add(layers.Dense(128, activation='tanh'))
    model.add(layers.Dropout(rate=0.1))
    model.add(layers.Dense(self.num_classes, activation = "softmax")) #Classification layer or output layer
    opt = tf.keras.optimizers.Adam(learning_rate=0.00001)
    model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

    return model

def trainModel(self, epochs):

    self.model.fit_generator(generator=self.train_batch_gen,
                            steps_per_epoch = int(self.len_train // self.batch_size),
                            epochs=epochs,
                            validation_data = self.val_batch_gen,
                            validation_steps = int(self.len_val // self.batch_size))

263

asked Jul 24 '19 12:07

Will

1 Answers

In my case, I used sparse_categorical_crossentropy with labels numbered from [1,2,3] (3 classes). In this case it produced NaNs from the start.

When I changed the labels from [1,2,3] to [0,1,2] the problem has disappeared.

answered Oct 23 '22 17:10

Timbus Calin

Related questions
                            
                                django-taggit not working when using UUID
                            
                                How to have a mix of both Celery Executor and Kubernetes Executor in Apache Airflow?
                            
                                Access Google Trends Data without a wrapper, or with the API: Python
                            
                                Why does python round(np.float16(np.pi),5) return infinity? Bug, limitation, or expected?
                            
                                How can gitlab-CI install private python packages from a gitlab dependency that also refers to gitlab repositories
                            
                                Effective-Date-Range One-Hot-Encode groupby
                            
                                Error state Kalman Filter from MATLAB to Python
                            
                                Not found: Container localhost does not exist when I load model with tensorflow and flask
                            
                                Why my one-filter convolutional neural network is unable to learn a simple gaussian kernel?
                            
                                Install from pipfile using pipenv install gives error
                            
                                How Batch learning in Pytorch is performed?
                            
                                How to enable logging of Flask app with `gevent.pywsgi.WSGIServer` and `WebSocketHandler`?
                            
                                Read YAML file as list
                            
                                How to vectorize a loop through a matrix numpy
                            
                                Edit existing PDF's pages in Python
                            
                                Setting the Python path for local project in VS Code without using the settings.json file
                            
                                Sentiment analysis Pipeline, problem getting the correct feature names when feature selection is used
                            
                                Using seaborn lineplot with grouping variable
                            
                                Multiprocessing code fails when run with pdb?
                            
                                Conda SafetyError: file has an incorrect size

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Loss is NaN on image classification task

Tags:

python

tensorflow

deep-learning

keras

conv-neural-network

Will

People also ask

1 Answers

Timbus Calin

Recent Activity

Donate For Us