I'm building a ResNet-18
classification model for the Stanford Cars dataset using transfer learning. I would like to implement label smoothing to penalize overconfident predictions and improve generalization.
TensorFlow
has a simple keyword argument in CrossEntropyLoss
. Has anyone built a similar function for PyTorch
that I could plug-and-play with?
Label smoothing is a regularization technique that perturbates the target variable, to make the model less certain of its predictions. It is viewed as a regularization technique because it restrains the largest logits fed into the softmax function from becoming much bigger than the rest.
Label smoothing has been used successfully to improve the accuracy of deep learning models across a range of tasks, including image classification, speech recognition, and machine translation (Table 1).
This criterion computes the cross entropy loss between input and target. It is useful when training a classification problem with C classes. If provided, the optional argument weight should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.
1 Answer. Cross-entropy loss, or log loss, measure the performance of a classification model whose output is a probability value between 0 and 1. It is preferred for classification, while mean squared error (MSE) is one of the best choices for regression. This comes directly from the statement of your problems itself.
The generalization and learning speed of a multi-class neural network can often be significantly improved by using soft targets that are a weighted average of the hard targets and the uniform distribution over labels. Smoothing the labels in this way prevents the network from becoming over-confident and label smoothing has been used in many state-of-the-art models, including image classification, language translation, and speech recognition.
Label Smoothing is already implemented in Tensorflow
within the cross-entropy loss functions. BinaryCrossentropy, CategoricalCrossentropy. But currently, there is no official implementation of Label Smoothing in PyTorch
. However, there is going an active discussion on it and hopefully, it will be provided with an official package. Here is that discussion thread: Issue #7455.
Here We will bring some available best implementation of Label Smoothing (LS) from PyTorch
practitioner. Basically, there are many ways to implement the LS. Please refer to this specific discussion on this, one is here, and another here. Here we will bring implementation in 2 unique ways with two versions of each; so total 4.
In this way, it accepts the one-hot
target vector. The user must manually smooth their target vector. And it can be done within with torch.no_grad()
scope, as it temporarily sets all of the requires_grad
flags to false.
import torch import numpy as np import torch.nn as nn import torch.nn.functional as F from torch.autograd import Variable from torch.nn.modules.loss import _WeightedLoss class LabelSmoothingLoss(nn.Module): def __init__(self, classes, smoothing=0.0, dim=-1, weight = None): """if smoothing == 0, it's one-hot method if 0 < smoothing < 1, it's smooth method """ super(LabelSmoothingLoss, self).__init__() self.confidence = 1.0 - smoothing self.smoothing = smoothing self.weight = weight self.cls = classes self.dim = dim def forward(self, pred, target): assert 0 <= self.smoothing < 1 pred = pred.log_softmax(dim=self.dim) if self.weight is not None: pred = pred * self.weight.unsqueeze(0) with torch.no_grad(): true_dist = torch.zeros_like(pred) true_dist.fill_(self.smoothing / (self.cls - 1)) true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence) return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))
Additionally, we've added an assertion checkmark on self. smoothing
and added loss weighting support on this implementation.
Shital already posted the answer here. Here we're pointing out that this implementation is similar to Devin Yang's above implementation. However, here we're mentioning his code with minimizing a bit of code syntax
.
class SmoothCrossEntropyLoss(_WeightedLoss): def __init__(self, weight=None, reduction='mean', smoothing=0.0): super().__init__(weight=weight, reduction=reduction) self.smoothing = smoothing self.weight = weight self.reduction = reduction def k_one_hot(self, targets:torch.Tensor, n_classes:int, smoothing=0.0): with torch.no_grad(): targets = torch.empty(size=(targets.size(0), n_classes), device=targets.device) \ .fill_(smoothing /(n_classes-1)) \ .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing) return targets def reduce_loss(self, loss): return loss.mean() if self.reduction == 'mean' else loss.sum() \ if self.reduction == 'sum' else loss def forward(self, inputs, targets): assert 0 <= self.smoothing < 1 targets = self.k_one_hot(targets, inputs.size(-1), self.smoothing) log_preds = F.log_softmax(inputs, -1) if self.weight is not None: log_preds = log_preds * self.weight.unsqueeze(0) return self.reduce_loss(-(targets * log_preds).sum(dim=-1))
Check
import torch import numpy as np import torch.nn as nn import torch.nn.functional as F from torch.autograd import Variable from torch.nn.modules.loss import _WeightedLoss if __name__=="__main__": # 1. Devin Yang crit = LabelSmoothingLoss(classes=5, smoothing=0.5) predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0], [0, 0.9, 0.2, 0.2, 1], [1, 0.2, 0.7, 0.9, 1]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v) # 2. Shital Shah crit = SmoothCrossEntropyLoss(smoothing=0.5) predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0], [0, 0.9, 0.2, 0.2, 1], [1, 0.2, 0.7, 0.9, 1]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v) tensor(1.4178) tensor(1.4178)
By this, it accepts the target vector and uses doesn't manually smooth the target vector, rather the built-in module takes care of the label smoothing. It allows us to implement label smoothing in terms of F.nll_loss
.
(a). Wangleiofficial: Source - (AFAIK), Original Poster
(b). Datasaurus: Source - Added Weighting Support
Further, we slightly minimize the coding write-up to make it more concise.
class LabelSmoothingLoss(torch.nn.Module): def __init__(self, smoothing: float = 0.1, reduction="mean", weight=None): super(LabelSmoothingLoss, self).__init__() self.smoothing = smoothing self.reduction = reduction self.weight = weight def reduce_loss(self, loss): return loss.mean() if self.reduction == 'mean' else loss.sum() \ if self.reduction == 'sum' else loss def linear_combination(self, x, y): return self.smoothing * x + (1 - self.smoothing) * y def forward(self, preds, target): assert 0 <= self.smoothing < 1 if self.weight is not None: self.weight = self.weight.to(preds.device) n = preds.size(-1) log_preds = F.log_softmax(preds, dim=-1) loss = self.reduce_loss(-log_preds.sum(dim=-1)) nll = F.nll_loss( log_preds, target, reduction=self.reduction, weight=self.weight ) return self.linear_combination(loss / n, nll)
class LabelSmoothing(nn.Module): """NLL loss with label smoothing. """ def __init__(self, smoothing=0.0): """Constructor for the LabelSmoothing module. :param smoothing: label smoothing factor """ super(LabelSmoothing, self).__init__() self.confidence = 1.0 - smoothing self.smoothing = smoothing def forward(self, x, target): logprobs = torch.nn.functional.log_softmax(x, dim=-1) nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1)) nll_loss = nll_loss.squeeze(1) smooth_loss = -logprobs.mean(dim=-1) loss = self.confidence * nll_loss + self.smoothing * smooth_loss return loss.mean()
Check
if __name__=="__main__": # Wangleiofficial crit = LabelSmoothingLoss(smoothing=0.3, reduction="mean") predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0], [0, 0.9, 0.2, 0.2, 1], [1, 0.2, 0.7, 0.9, 1]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v) # NVIDIA crit = LabelSmoothing(smoothing=0.3) predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0], [0, 0.9, 0.2, 0.2, 1], [1, 0.2, 0.7, 0.9, 1]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v) tensor(1.3883) tensor(1.3883)
torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=- 100, reduce=None, reduction='mean', label_smoothing=0.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With