How does TensorFlow SparseCategoricalCrossentropy work?

Tags:

I'm trying to understand this loss function in TensorFlow but I don't get it. It's SparseCategoricalCrossentropy. All other loss functions need outputs and labels of the same shape, this specific loss function doesn't.

Source code:

import tensorflow as tf;

scce = tf.keras.losses.SparseCategoricalCrossentropy();
Loss = scce(
  tf.constant([ 1,    1,    1,    2   ], tf.float32),
  tf.constant([[1,2],[3,4],[5,6],[7,8]], tf.float32)
);
print("Loss:", Loss.numpy());

The error is:

InvalidArgumentError: Received a label value of 2 which is outside the valid range of [0, 2).  
Label values: 1 1 1 2 [Op:SparseSoftmaxCrossEntropyWithLogits]

How to provide proper params to the loss function SparseCategoricalCrossentropy?

275

asked Jan 17 '20 13:01

Dee

2 Answers

SparseCategoricalCrossentropy and CategoricalCrossentropy both compute categorical cross-entropy. The only difference is in how the targets/labels should be encoded.

When using SparseCategoricalCrossentropy the targets are represented by the index of the category (starting from 0). Your outputs have shape 4x2, which means you have two categories. Therefore, the targets should be a 4 dimensional vector with entries that are either 0 or 1. For example:

scce = tf.keras.losses.SparseCategoricalCrossentropy();
Loss = scce(
  tf.constant([ 0,    0,    0,    1   ], tf.float32),
  tf.constant([[1,2],[3,4],[5,6],[7,8]], tf.float32))

This in contrast to CategoricalCrossentropy where the labels should be one-hot encoded:

cce = tf.keras.losses.CategoricalCrossentropy();
Loss = cce(
  tf.constant([ [1,0]    [1,0],    [1, 0],   [0, 1]   ], tf.float32),
  tf.constant([[1,2],[3,4],[5,6],[7,8]], tf.float32))

SparseCategoricalCrossentropy is more efficient when you have a lot of categories.

answered Sep 19 '22 14:09

GeertH

I wanted to add a few more things that may be confusing. The SparseCategoricalCrossentropy has two arguments which are very important to specify. The first is from_logits; recall logits are the outputs of a network that HASN'T been normalized via a Softmax(or Sigmoid). The second is reduction. It is normally set to 'auto', which computes the categorical cross-entropy as normal, which is the average of label*log(pred). But setting the value to 'none' will actually give you each element of the categorical cross-entropy label*log(pred), which is of shape (batch_size). Computing a reduce_mean on this list will give you the same result as with reduction='auto'.

# Assuming TF2.x
import tensorflow as tf

model_predictions = tf.constant([[1,2], [3,4], [5,6], [7,8]], tf.float32)
labels_sparse = tf.constant([1, 0, 0, 1 ], tf.float32)
labels_dense = tf.constant([[1,0], [1,0], [1,0], [0,1]], tf.float32)

loss_obj_scc = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True,
    reduction='auto'
)
loss_from_scc = loss_obj_scc(
    labels_sparse,
    model_predictions,
  )


loss_obj_cc = tf.keras.losses.CategoricalCrossentropy(
    from_logits=True,
    reduction='auto'
)
loss_from_cc = loss_obj_cc(
    labels_dense,
    model_predictions,
  )


print(loss_from_scc, loss_from_cc)
>> (<tf.Tensor: shape=(), dtype=float32, numpy=0.8132617>,
 <tf.Tensor: shape=(), dtype=float32, numpy=1.0632616>)

# With `reduction='none'`
loss_obj_scc_red = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True,
    reduction='none')

loss_from_scc_red = loss_obj_scc_red(
    labels_sparse,
    model_predictions,
  )

print(loss_from_scc_red, tf.math.reduce_mean(loss_from_scc_red))

>> (<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0.31326166, 1.3132616 , 
1.3132616 , 0.31326166], dtype=float32)>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.8132617>)

answered Sep 20 '22 14:09

Wolfgang

Related questions
                            
                                How do I pass an OpenCV Mat into a C++ Tensorflow graph?
                            
                                Running Adam Optimizer
                            
                                How to extract and save images from tensorboard event summary?
                            
                                keras - cannot import name Conv2D
                            
                                What is the difference between MaxPool and MaxPooling layers in Keras?
                            
                                Tensorflow ValueError: No variables to save from
                            
                                How can I run Tensorflow on one single core?
                            
                                TensorFlow on Windows: "Couldn't open CUDA library cudnn64_5.dll"
                            
                                'Dense' object has no attribute 'op' [closed]
                            
                                How to rename a variable which respects the name scope?
                            
                                module 'tensorflow._api.v2.train' has no attribute 'GradientDescentOptimizer'
                            
                                How to graph tf.keras model in Tensorflow-2.0?
                            
                                TensorFlow on Windows: "not a supported wheel on this platform" error
                            
                                How to install libcusolver.so.11
                            
                                What are c_state and m_state in Tensorflow LSTM?
                            
                                Streaming large training and test files into Tensorflow's DNNClassifier
                            
                                ModuleNotFoundError: No module named 'tensorflow.examples'
                            
                                tensorflow constant with variable size
                            
                                Keras LSTM input dimension setting
                            
                                How to do a column sum in Tensorflow?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does TensorFlow SparseCategoricalCrossentropy work?

Tags:

machine-learning

tensorflow

deep-learning

cross-entropy

loss-function

Dee

People also ask

2 Answers

GeertH

Wolfgang

Recent Activity

Donate For Us