Label Smoothing in PyTorch

Tags:

I'm building a ResNet-18 classification model for the Stanford Cars dataset using transfer learning. I would like to implement label smoothing to penalize overconfident predictions and improve generalization.

TensorFlow has a simple keyword argument in CrossEntropyLoss. Has anyone built a similar function for PyTorch that I could plug-and-play with?

897

asked Apr 15 '19 01:04

Jared Nielsen

1 Answers

The generalization and learning speed of a multi-class neural network can often be significantly improved by using soft targets that are a weighted average of the hard targets and the uniform distribution over labels. Smoothing the labels in this way prevents the network from becoming over-confident and label smoothing has been used in many state-of-the-art models, including image classification, language translation, and speech recognition.

Label Smoothing is already implemented in Tensorflow within the cross-entropy loss functions. BinaryCrossentropy, CategoricalCrossentropy. But currently, there is no official implementation of Label Smoothing in PyTorch. However, there is going an active discussion on it and hopefully, it will be provided with an official package. Here is that discussion thread: Issue #7455.

Here We will bring some available best implementation of Label Smoothing (LS) from PyTorch practitioner. Basically, there are many ways to implement the LS. Please refer to this specific discussion on this, one is here, and another here. Here we will bring implementation in 2 unique ways with two versions of each; so total 4.

Option 1: CrossEntropyLossWithProbs

In this way, it accepts the one-hot target vector. The user must manually smooth their target vector. And it can be done within with torch.no_grad() scope, as it temporarily sets all of the requires_grad flags to false.

Devin Yang: Source

import torch import numpy as np import torch.nn as nn import torch.nn.functional as F from torch.autograd import Variable from torch.nn.modules.loss import _WeightedLoss   class LabelSmoothingLoss(nn.Module):     def __init__(self, classes, smoothing=0.0, dim=-1, weight = None):         """if smoothing == 0, it's one-hot method            if 0 < smoothing < 1, it's smooth method         """         super(LabelSmoothingLoss, self).__init__()         self.confidence = 1.0 - smoothing         self.smoothing = smoothing         self.weight = weight         self.cls = classes         self.dim = dim      def forward(self, pred, target):         assert 0 <= self.smoothing < 1         pred = pred.log_softmax(dim=self.dim)          if self.weight is not None:             pred = pred * self.weight.unsqueeze(0)             with torch.no_grad():             true_dist = torch.zeros_like(pred)             true_dist.fill_(self.smoothing / (self.cls - 1))             true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)         return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))

Additionally, we've added an assertion checkmark on self. smoothing and added loss weighting support on this implementation.

Shital Shah: Source

Shital already posted the answer here. Here we're pointing out that this implementation is similar to Devin Yang's above implementation. However, here we're mentioning his code with minimizing a bit of code syntax.

class SmoothCrossEntropyLoss(_WeightedLoss):     def __init__(self, weight=None, reduction='mean', smoothing=0.0):         super().__init__(weight=weight, reduction=reduction)         self.smoothing = smoothing         self.weight = weight         self.reduction = reduction      def k_one_hot(self, targets:torch.Tensor, n_classes:int, smoothing=0.0):         with torch.no_grad():             targets = torch.empty(size=(targets.size(0), n_classes),                                   device=targets.device) \                                   .fill_(smoothing /(n_classes-1)) \                                   .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing)         return targets      def reduce_loss(self, loss):         return loss.mean() if self.reduction == 'mean' else loss.sum() \         if self.reduction == 'sum' else loss      def forward(self, inputs, targets):         assert 0 <= self.smoothing < 1          targets = self.k_one_hot(targets, inputs.size(-1), self.smoothing)         log_preds = F.log_softmax(inputs, -1)          if self.weight is not None:             log_preds = log_preds * self.weight.unsqueeze(0)          return self.reduce_loss(-(targets * log_preds).sum(dim=-1))

Check

import torch import numpy as np import torch.nn as nn import torch.nn.functional as F from torch.autograd import Variable from torch.nn.modules.loss import _WeightedLoss   if __name__=="__main__":     # 1. Devin Yang     crit = LabelSmoothingLoss(classes=5, smoothing=0.5)     predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],                                  [0, 0.9, 0.2, 0.2, 1],                                   [1, 0.2, 0.7, 0.9, 1]])     v = crit(Variable(predict),              Variable(torch.LongTensor([2, 1, 0])))     print(v)      # 2. Shital Shah     crit = SmoothCrossEntropyLoss(smoothing=0.5)     predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],                                  [0, 0.9, 0.2, 0.2, 1],                                   [1, 0.2, 0.7, 0.9, 1]])     v = crit(Variable(predict),              Variable(torch.LongTensor([2, 1, 0])))     print(v)  tensor(1.4178) tensor(1.4178)

Option 2: LabelSmoothingCrossEntropyLoss

By this, it accepts the target vector and uses doesn't manually smooth the target vector, rather the built-in module takes care of the label smoothing. It allows us to implement label smoothing in terms of F.nll_loss.

(a). Wangleiofficial: Source - (AFAIK), Original Poster

(b). Datasaurus: Source - Added Weighting Support

Further, we slightly minimize the coding write-up to make it more concise.

class LabelSmoothingLoss(torch.nn.Module):     def __init__(self, smoothing: float = 0.1,                   reduction="mean", weight=None):         super(LabelSmoothingLoss, self).__init__()         self.smoothing   = smoothing         self.reduction = reduction         self.weight    = weight      def reduce_loss(self, loss):         return loss.mean() if self.reduction == 'mean' else loss.sum() \          if self.reduction == 'sum' else loss      def linear_combination(self, x, y):         return self.smoothing * x + (1 - self.smoothing) * y      def forward(self, preds, target):         assert 0 <= self.smoothing < 1          if self.weight is not None:             self.weight = self.weight.to(preds.device)          n = preds.size(-1)         log_preds = F.log_softmax(preds, dim=-1)         loss = self.reduce_loss(-log_preds.sum(dim=-1))         nll = F.nll_loss(             log_preds, target, reduction=self.reduction, weight=self.weight         )         return self.linear_combination(loss / n, nll)

NVIDIA/DeepLearningExamples: Source

class LabelSmoothing(nn.Module):     """NLL loss with label smoothing.     """     def __init__(self, smoothing=0.0):         """Constructor for the LabelSmoothing module.         :param smoothing: label smoothing factor         """         super(LabelSmoothing, self).__init__()         self.confidence = 1.0 - smoothing         self.smoothing = smoothing      def forward(self, x, target):         logprobs = torch.nn.functional.log_softmax(x, dim=-1)         nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))         nll_loss = nll_loss.squeeze(1)         smooth_loss = -logprobs.mean(dim=-1)         loss = self.confidence * nll_loss + self.smoothing * smooth_loss         return loss.mean()

Check

if __name__=="__main__":     # Wangleiofficial     crit = LabelSmoothingLoss(smoothing=0.3, reduction="mean")     predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],                                  [0, 0.9, 0.2, 0.2, 1],                                   [1, 0.2, 0.7, 0.9, 1]])      v = crit(Variable(predict),              Variable(torch.LongTensor([2, 1, 0])))     print(v)      # NVIDIA     crit = LabelSmoothing(smoothing=0.3)     predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],                                  [0, 0.9, 0.2, 0.2, 1],                                   [1, 0.2, 0.7, 0.9, 1]])     v = crit(Variable(predict),              Variable(torch.LongTensor([2, 1, 0])))     print(v)  tensor(1.3883) tensor(1.3883)

Update: Officially Added

torch.nn.CrossEntropyLoss(weight=None, size_average=None,                            ignore_index=- 100, reduce=None,                            reduction='mean', label_smoothing=0.0)

151

answered Sep 24 '22 14:09

M.Innat

Related questions
                            
                                With assignment expressions in Python 3.8, why do we need to use `as` in `with`?
                            
                                Force Anaconda to install tensorflow 1.14
                            
                                What's the Ruby equivalent of Python's os.walk?
                            
                                Writing a Python Music Streamer
                            
                                Is it safe to use os.environ.setdefault?
                            
                                Convert a unixtime to a datetime object and back again (pair of time conversion functions that are inverses)
                            
                                Matplotlib - Python Error
                            
                                Drag and drop explorer files to tkinter entry widget?
                            
                                DAG(directed acyclic graph) dynamic job scheduler
                            
                                Meaning of 0x and \x in python hex strings?
                            
                                Python ConfigParser Checking existence of both Section and Key Value
                            
                                How to obtain the results from a pool of threads in python?
                            
                                Python coding convention "Wrong continued indentation before block: found by pylint
                            
                                Python Flask shutdown event handler
                            
                                How to get both return code and output from subprocess in Python? [duplicate]
                            
                                Sorting Pandas Dataframe by order of another index
                            
                                How to redirect to external URL in Django?
                            
                                How to enable intellisense for python in Visual Studio Code with anaconda3?
                            
                                Running Flask dev server in Python 3.6 raises ImportError for SocketServer and ForkingMixIn
                            
                                Is boto3 client thread-safe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Label Smoothing in PyTorch

Tags:

python

machine-learning

pytorch

transfer-learning