What is the difference between <code>sparse_categorical_crossentropy</code> and <code>categorical_crossentropy</code>? When should one loss be used as opposed to the other? For example, are these losses suitable for linear regression?

Simply: <ul> <li> <code>categorical_crossentropy</code> (<code>cce</code>) produces a one-hot array containing the probable match for each category,</li> <li> <code>sparse_categorical_crossentropy</code> (<code>scce</code>) produces a category index of the most likely matching category.</li> </ul> Consider a classification problem with 5 categories (or classes). <ul> <li> In the case of <code>cce</code>, the one-hot target may be <code>[0, 1, 0, 0, 0]</code> and the model may predict <code>[.2, .5, .1, .1, .1]</code> (probably right) </li> <li> In the case of <code>scce</code>, the target index may be [1] and the model may predict: [.5]. </li> </ul> Consider now a classification problem with 3 classes. <ul> <li>In the case of <code>cce</code>, the one-hot target might be <code>[0, 0, 1]</code> and the model may predict <code>[.5, .1, .4]</code> (probably inaccurate, given that it gives more probability to the first class)</li> <li>In the case of <code>scce</code>, the target index might be <code>[0]</code>, and the model may predict <code>[.5]</code> </li> </ul> Many categorical models produce <code>scce</code> output because you save space, but lose A LOT of information (for example, in the 2nd example, index 2 was also very close.) I generally prefer <code>cce</code> output for model reliability. There are a number of situations to use <code>scce</code>, including: <ul> <li>when your classes are mutually exclusive, i.e. you don't care at all about other close-enough predictions,</li> <li>the number of categories is large to the prediction output becomes overwhelming.</li> </ul>

I was also confused with this one. Fortunately, the excellent keras documentation came to the rescue. Both have the same loss function and are ultimately doing the same thing, only difference is in the representation of the true labels. <ul> <li>Categorical Cross Entropy [Doc]:</li> </ul> <blockquote> Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided in a one_hot representation. </blockquote> <pre class="prettyprint"><code>>>> y_true = [[0, 1, 0], [0, 0, 1]] >>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] >>> # Using 'auto'/'sum_over_batch_size' reduction type. >>> cce = tf.keras.losses.CategoricalCrossentropy() >>> cce(y_true, y_pred).numpy() 1.177 </code></pre> <ul> <li>Sparse Categorical Cross Entropy [Doc]:</li> </ul> <blockquote> Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers. </blockquote> <pre class="prettyprint"><code>>>> y_true = [1, 2] >>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] >>> # Using 'auto'/'sum_over_batch_size' reduction type. >>> scce = tf.keras.losses.SparseCategoricalCrossentropy() >>> scce(y_true, y_pred).numpy() 1.177 </code></pre> One good example of the sparse-categorical-cross-entropy is the fasion-mnist dataset. <pre class="prettyprint"><code>import tensorflow as tf from tensorflow import keras fashion_mnist = keras.datasets.fashion_mnist (X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data() print(y_train_full.shape) # (60000,) print(y_train_full.dtype) # uint8 y_train_full[:10] # array([9, 0, 0, 3, 0, 2, 7, 2, 5, 5], dtype=uint8) </code></pre>

What is the difference between sparse_categorical_crossentropy and categorical_crossentropy?

Tags:

python

machine-learning

tensorflow

deep-learning

keras

What is the difference between sparse_categorical_crossentropy and categorical_crossentropy? When should one loss be used as opposed to the other? For example, are these losses suitable for linear regression?

823

asked Oct 25 '19 20:10

xpertdev

2 Answers

Simply:

categorical_crossentropy (cce) produces a one-hot array containing the probable match for each category,
sparse_categorical_crossentropy (scce) produces a category index of the most likely matching category.

Consider a classification problem with 5 categories (or classes).

In the case of cce, the one-hot target may be [0, 1, 0, 0, 0] and the model may predict [.2, .5, .1, .1, .1] (probably right)
In the case of scce, the target index may be [1] and the model may predict: [.5].

Consider now a classification problem with 3 classes.

In the case of cce, the one-hot target might be [0, 0, 1] and the model may predict [.5, .1, .4] (probably inaccurate, given that it gives more probability to the first class)
In the case of scce, the target index might be [0], and the model may predict [.5]

Many categorical models produce scce output because you save space, but lose A LOT of information (for example, in the 2nd example, index 2 was also very close.) I generally prefer cce output for model reliability.

There are a number of situations to use scce, including:

when your classes are mutually exclusive, i.e. you don't care at all about other close-enough predictions,
the number of categories is large to the prediction output becomes overwhelming.

115

answered Oct 06 '22 02:10

dturvene

I was also confused with this one. Fortunately, the excellent keras documentation came to the rescue. Both have the same loss function and are ultimately doing the same thing, only difference is in the representation of the true labels.

Categorical Cross Entropy [Doc]:

Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided in a one_hot representation.

>>> y_true = [[0, 1, 0], [0, 0, 1]] >>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] >>> # Using 'auto'/'sum_over_batch_size' reduction type.   >>> cce = tf.keras.losses.CategoricalCrossentropy() >>> cce(y_true, y_pred).numpy() 1.177

Sparse Categorical Cross Entropy [Doc]:

Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers.

>>> y_true = [1, 2] >>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] >>> # Using 'auto'/'sum_over_batch_size' reduction type.   >>> scce = tf.keras.losses.SparseCategoricalCrossentropy() >>> scce(y_true, y_pred).numpy() 1.177

One good example of the sparse-categorical-cross-entropy is the fasion-mnist dataset.

import tensorflow as tf from tensorflow import keras  fashion_mnist = keras.datasets.fashion_mnist (X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()  print(y_train_full.shape) # (60000,) print(y_train_full.dtype) # uint8  y_train_full[:10] # array([9, 0, 0, 3, 0, 2, 7, 2, 5, 5], dtype=uint8)

answered Oct 06 '22 00:10

Bitswazsky

Related questions
                            
                                When using pathlib, getting error: TypeError: invalid file: PosixPath('example.txt')
                            
                                Best python XMPP / Jabber client library? [closed]
                            
                                Python: Passing variables between functions
                            
                                Python Anaconda: should I use `conda activate` or `source activate` in linux
                            
                                Best practices: how do you list required dependencies in your setup.py?
                            
                                only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
                            
                                Discord.py error: TypeError: __new__() got an unexpected keyword argument 'deny_new'
                            
                                Is there any reason to choose __new__ over __init__ when defining a metaclass?
                            
                                Python: make eval safe [duplicate]
                            
                                How can I run the current file in PyCharm
                            
                                Debugging python programs in emacs
                            
                                Import python module NOT on path
                            
                                How do I mock the filesystem in Python unit tests?
                            
                                How to install python3.9 with conda?
                            
                                cx_Oracle: How do I iterate over a result set?
                            
                                How to remove pip package after deleting it manually
                            
                                What is the difference between .one() and .first()
                            
                                Python multi-line with statement
                            
                                Contains of HashSet<Integer> in Python
                            
                                Printing out actual error message for ValueError

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With