What is the difference between sparse_categorical_crossentropy
and categorical_crossentropy
? When should one loss be used as opposed to the other? For example, are these losses suitable for linear regression?
Use sparse categorical crossentropy when your classes are mutually exclusive (e.g. when each sample belongs exactly to one class) and categorical crossentropy when one sample can have multiple classes or labels are soft probabilities (like [0.5, 0.3, 0.2]).
The only difference between sparse categorical cross entropy and categorical cross entropy is the format of true labels. When we have a single-label, multi-class classification problem, the labels are mutually exclusive for each data, meaning each data entry can only belong to one class.
sparse_categorical_crossentropy. Training a neural network involves passing data forward, through the model, and comparing predictions with ground truth labels. This comparison is done by a loss function. In multiclass classification problems, categorical crossentropy loss is the loss function of choice.
categorical_crossentropy: Used as a loss function for multi-class classification model where there are two or more output labels. The output label is assigned one-hot category encoding value in form of 0s and 1. The output label, if present in integer form, is converted into categorical encoding using keras.
Simply:
categorical_crossentropy
(cce
) produces a one-hot array containing the probable match for each category,sparse_categorical_crossentropy
(scce
) produces a category index of the most likely matching category.Consider a classification problem with 5 categories (or classes).
In the case of cce
, the one-hot target may be [0, 1, 0, 0, 0]
and the model may predict [.2, .5, .1, .1, .1]
(probably right)
In the case of scce
, the target index may be [1] and the model may predict: [.5].
Consider now a classification problem with 3 classes.
cce
, the one-hot target might be [0, 0, 1]
and the model may predict [.5, .1, .4]
(probably inaccurate, given that it gives more probability to the first class)scce
, the target index might be [0]
, and the model may predict [.5]
Many categorical models produce scce
output because you save space, but lose A LOT of information (for example, in the 2nd example, index 2 was also very close.) I generally prefer cce
output for model reliability.
There are a number of situations to use scce
, including:
I was also confused with this one. Fortunately, the excellent keras documentation came to the rescue. Both have the same loss function and are ultimately doing the same thing, only difference is in the representation of the true labels.
Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided in a one_hot representation.
>>> y_true = [[0, 1, 0], [0, 0, 1]] >>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] >>> # Using 'auto'/'sum_over_batch_size' reduction type. >>> cce = tf.keras.losses.CategoricalCrossentropy() >>> cce(y_true, y_pred).numpy() 1.177
Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers.
>>> y_true = [1, 2] >>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] >>> # Using 'auto'/'sum_over_batch_size' reduction type. >>> scce = tf.keras.losses.SparseCategoricalCrossentropy() >>> scce(y_true, y_pred).numpy() 1.177
One good example of the sparse-categorical-cross-entropy is the fasion-mnist dataset.
import tensorflow as tf from tensorflow import keras fashion_mnist = keras.datasets.fashion_mnist (X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data() print(y_train_full.shape) # (60000,) print(y_train_full.dtype) # uint8 y_train_full[:10] # array([9, 0, 0, 3, 0, 2, 7, 2, 5, 5], dtype=uint8)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With