Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between sparse_categorical_crossentropy and categorical_crossentropy?

What is the difference between sparse_categorical_crossentropy and categorical_crossentropy? When should one loss be used as opposed to the other? For example, are these losses suitable for linear regression?

like image 823
xpertdev Avatar asked Oct 25 '19 20:10

xpertdev


People also ask

When should I use Categorical_crossentropy?

Use sparse categorical crossentropy when your classes are mutually exclusive (e.g. when each sample belongs exactly to one class) and categorical crossentropy when one sample can have multiple classes or labels are soft probabilities (like [0.5, 0.3, 0.2]).

What is the difference between Sparsecategoricalcrossentropy and Categoricalcrossentropy?

The only difference between sparse categorical cross entropy and categorical cross entropy is the format of true labels. When we have a single-label, multi-class classification problem, the labels are mutually exclusive for each data, meaning each data entry can only belong to one class.

How does sparse_categorical_crossentropy work?

sparse_categorical_crossentropy. Training a neural network involves passing data forward, through the model, and comparing predictions with ground truth labels. This comparison is done by a loss function. In multiclass classification problems, categorical crossentropy loss is the loss function of choice.

What is Categorical_crossentropy in keras?

categorical_crossentropy: Used as a loss function for multi-class classification model where there are two or more output labels. The output label is assigned one-hot category encoding value in form of 0s and 1. The output label, if present in integer form, is converted into categorical encoding using keras.


2 Answers

Simply:

  • categorical_crossentropy (cce) produces a one-hot array containing the probable match for each category,
  • sparse_categorical_crossentropy (scce) produces a category index of the most likely matching category.

Consider a classification problem with 5 categories (or classes).

  • In the case of cce, the one-hot target may be [0, 1, 0, 0, 0] and the model may predict [.2, .5, .1, .1, .1] (probably right)

  • In the case of scce, the target index may be [1] and the model may predict: [.5].

Consider now a classification problem with 3 classes.

  • In the case of cce, the one-hot target might be [0, 0, 1] and the model may predict [.5, .1, .4] (probably inaccurate, given that it gives more probability to the first class)
  • In the case of scce, the target index might be [0], and the model may predict [.5]

Many categorical models produce scce output because you save space, but lose A LOT of information (for example, in the 2nd example, index 2 was also very close.) I generally prefer cce output for model reliability.

There are a number of situations to use scce, including:

  • when your classes are mutually exclusive, i.e. you don't care at all about other close-enough predictions,
  • the number of categories is large to the prediction output becomes overwhelming.
like image 115
dturvene Avatar answered Oct 06 '22 02:10

dturvene


I was also confused with this one. Fortunately, the excellent keras documentation came to the rescue. Both have the same loss function and are ultimately doing the same thing, only difference is in the representation of the true labels.

  • Categorical Cross Entropy [Doc]:

Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided in a one_hot representation.

>>> y_true = [[0, 1, 0], [0, 0, 1]] >>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] >>> # Using 'auto'/'sum_over_batch_size' reduction type.   >>> cce = tf.keras.losses.CategoricalCrossentropy() >>> cce(y_true, y_pred).numpy() 1.177 
  • Sparse Categorical Cross Entropy [Doc]:

Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers.

>>> y_true = [1, 2] >>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] >>> # Using 'auto'/'sum_over_batch_size' reduction type.   >>> scce = tf.keras.losses.SparseCategoricalCrossentropy() >>> scce(y_true, y_pred).numpy() 1.177 

One good example of the sparse-categorical-cross-entropy is the fasion-mnist dataset.

import tensorflow as tf from tensorflow import keras  fashion_mnist = keras.datasets.fashion_mnist (X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()  print(y_train_full.shape) # (60000,) print(y_train_full.dtype) # uint8  y_train_full[:10] # array([9, 0, 0, 3, 0, 2, 7, 2, 5, 5], dtype=uint8) 
like image 37
Bitswazsky Avatar answered Oct 06 '22 00:10

Bitswazsky