Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does TensorFlow SparseCategoricalCrossentropy work?

I'm trying to understand this loss function in TensorFlow but I don't get it. It's SparseCategoricalCrossentropy. All other loss functions need outputs and labels of the same shape, this specific loss function doesn't.

Source code:

import tensorflow as tf;

scce = tf.keras.losses.SparseCategoricalCrossentropy();
Loss = scce(
  tf.constant([ 1,    1,    1,    2   ], tf.float32),
  tf.constant([[1,2],[3,4],[5,6],[7,8]], tf.float32)
);
print("Loss:", Loss.numpy());

The error is:

InvalidArgumentError: Received a label value of 2 which is outside the valid range of [0, 2).  
Label values: 1 1 1 2 [Op:SparseSoftmaxCrossEntropyWithLogits]

How to provide proper params to the loss function SparseCategoricalCrossentropy?

like image 275
Dee Avatar asked Jan 17 '20 13:01

Dee


People also ask

How does sparse_categorical_crossentropy work?

sparse_categorical_crossentropy. Training a neural network involves passing data forward, through the model, and comparing predictions with ground truth labels. This comparison is done by a loss function. In multiclass classification problems, categorical crossentropy loss is the loss function of choice.

What is the difference between sparse_categorical_crossentropy and Categorical_crossentropy?

categorical_crossentropy ( cce ) produces a one-hot array containing the probable match for each category, sparse_categorical_crossentropy ( scce ) produces a category index of the most likely matching category.

What does sparse categorical accuracy mean?

Sparse TopK Categorical Accuracy calculates the percentage of records for which the integer targets (yTrue) are in the top K predictions (yPred). yTrue consists of the index (0 to n-1) of the non zero targets instead of the one-hot targets like in TopK Categorical Accuracy.

Where is sparse categorical cross-entropy used?

Use sparse categorical crossentropy when your classes are mutually exclusive (e.g. when each sample belongs exactly to one class) and categorical crossentropy when one sample can have multiple classes or labels are soft probabilities (like [0.5, 0.3, 0.2]).


2 Answers

SparseCategoricalCrossentropy and CategoricalCrossentropy both compute categorical cross-entropy. The only difference is in how the targets/labels should be encoded.

When using SparseCategoricalCrossentropy the targets are represented by the index of the category (starting from 0). Your outputs have shape 4x2, which means you have two categories. Therefore, the targets should be a 4 dimensional vector with entries that are either 0 or 1. For example:

scce = tf.keras.losses.SparseCategoricalCrossentropy();
Loss = scce(
  tf.constant([ 0,    0,    0,    1   ], tf.float32),
  tf.constant([[1,2],[3,4],[5,6],[7,8]], tf.float32))

This in contrast to CategoricalCrossentropy where the labels should be one-hot encoded:

cce = tf.keras.losses.CategoricalCrossentropy();
Loss = cce(
  tf.constant([ [1,0]    [1,0],    [1, 0],   [0, 1]   ], tf.float32),
  tf.constant([[1,2],[3,4],[5,6],[7,8]], tf.float32))

SparseCategoricalCrossentropy is more efficient when you have a lot of categories.

like image 85
GeertH Avatar answered Sep 19 '22 14:09

GeertH


I wanted to add a few more things that may be confusing. The SparseCategoricalCrossentropy has two arguments which are very important to specify. The first is from_logits; recall logits are the outputs of a network that HASN'T been normalized via a Softmax(or Sigmoid). The second is reduction. It is normally set to 'auto', which computes the categorical cross-entropy as normal, which is the average of label*log(pred). But setting the value to 'none' will actually give you each element of the categorical cross-entropy label*log(pred), which is of shape (batch_size). Computing a reduce_mean on this list will give you the same result as with reduction='auto'.

# Assuming TF2.x
import tensorflow as tf

model_predictions = tf.constant([[1,2], [3,4], [5,6], [7,8]], tf.float32)
labels_sparse = tf.constant([1, 0, 0, 1 ], tf.float32)
labels_dense = tf.constant([[1,0], [1,0], [1,0], [0,1]], tf.float32)

loss_obj_scc = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True,
    reduction='auto'
)
loss_from_scc = loss_obj_scc(
    labels_sparse,
    model_predictions,
  )


loss_obj_cc = tf.keras.losses.CategoricalCrossentropy(
    from_logits=True,
    reduction='auto'
)
loss_from_cc = loss_obj_cc(
    labels_dense,
    model_predictions,
  )


print(loss_from_scc, loss_from_cc)
>> (<tf.Tensor: shape=(), dtype=float32, numpy=0.8132617>,
 <tf.Tensor: shape=(), dtype=float32, numpy=1.0632616>)
# With `reduction='none'`
loss_obj_scc_red = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True,
    reduction='none')

loss_from_scc_red = loss_obj_scc_red(
    labels_sparse,
    model_predictions,
  )

print(loss_from_scc_red, tf.math.reduce_mean(loss_from_scc_red))

>> (<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0.31326166, 1.3132616 , 
1.3132616 , 0.31326166], dtype=float32)>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.8132617>)
like image 20
Wolfgang Avatar answered Sep 20 '22 14:09

Wolfgang