Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are C classes for a NLLLoss loss function in Pytorch?

I'm asking about C classes for a NLLLoss loss function.

The documentation states:

The negative log likelihood loss. It is useful to train a classification problem with C classes.

Basically everything after that point depends upon you knowing what a C class is, and I thought I knew what a C class was but the documentation doesn't make much sense to me. Especially when it describes the expected inputs of (N, C) where C = number of classes. That's where I'm confused, because I thought a C class refers to the output only. My understanding was that the C class was a one hot vector of classifications. I've often found in tutorials that the NLLLoss was often paired with a LogSoftmax to solve classification problems.

I was expecting to use NLLLoss in the following example:

# Some random training data
input = torch.randn(5, requires_grad=True)
print(input)  # tensor([-1.3533, -1.3074, -1.7906,  0.3113,  0.7982], requires_grad=True)
# Build my NN (here it's just a LogSoftmax)
m = nn.LogSoftmax(dim=0)
# Train my NN with the data
output = m(input)
print(output)  # tensor([-2.8079, -2.7619, -3.2451, -1.1432, -0.6564], grad_fn=<LogSoftmaxBackward>)
loss = nn.NLLLoss()
print(loss(output, torch.tensor([1, 0, 0])))

The above raises the following error on the last line:

ValueError: Expected 2 or more dimensions (got 1)

We can ignore the error, because clearly I don't understand what I'm doing. Here I'll explain my intentions of the above source code.

input = torch.randn(5, requires_grad=True)

Random 1D array to pair with one hot vector of [1, 0, 0] for training. I'm trying to do a binary bits to one hot vector of decimal numbers.

m = nn.LogSoftmax(dim=0)

The documentation for LogSoftmax says that the output will be the same shape as the input, but I've only seen examples of LogSoftmax(dim=1) and therefore I've been stuck trying to make this work because I can't find a relative example.

print(loss(output, torch.tensor([1, 0, 0])))

So now I have the output of the NN, and I want to know the loss from my classification [1, 0, 0]. It doesn't really matter in this example what any of the data is. I just want a loss for a one hot vector that represents classification.

At this point I get stuck trying to resolve errors from the loss function relating to expected output and input structures. I've tried using view(...) on the output and input to fix the shape, but that just gets me other errors.

So this goes back to my original question and I'll show the example from the documentation to explain my confusion:

m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
input = torch.randn(3, 5, requires_grad=True)
train = torch.tensor([1, 0, 4])
print('input', input)  # input tensor([[...],[...],[...]], requires_grad=True)
output = m(input)
print('train', output, train)  # tensor([[...],[...],[...]],grad_fn=<LogSoftmaxBackward>) tensor([1, 0, 4])
x = loss(output, train)

Again, we have dim=1 on LogSoftmax which confuses me now, because look at the input data. It's a 3x5 tensor and I'm lost.

Here's the documentation on the first input for the NLLLoss function:

Input: (N, C)(N,C) where C = number of classes

The inputs are grouped by the number of classes?

So each row of the tensor input is associated with each element of the training tensor?

If I change the second dimension of the input tensor, then nothing breaks and I don't understand what is going on.

input = torch.randn(3, 100, requires_grad=True)
# 3 x 100 still works?

So I don't understand what a C class is here, and I thought a C class was a classification (like a label) and meaningful only on the outputs of the NN.

I hope you understand my confusion, because shouldn't the shape of the inputs for the NN be independent from the shape of the one hot vector used for classification?

Both the code examples and documentations say that the shape of the inputs is defined by the number of classifications, and I don't really understand why.

I have tried to study the documentations and tutorials to understand what I'm missing, but after several days of not being able to get past this point I've decided to ask this question. It's been humbling because I thought this was going to be one of the easier things to learn.

like image 274
Reactgular Avatar asked Jan 13 '20 14:01

Reactgular


People also ask

Which of the following can be used as loss functions in PyTorch?

Which loss functions are available in PyTorch? Broadly speaking, loss functions in PyTorch are divided into two main categories: regression losses and classification losses. Regression loss functions are used when the model is predicting a continuous value, like the age of a person.

Which loss function is used for multi class classification PyTorch?

For multi-class classification, the two main loss (error) functions are cross entropy error and mean squared error.

How is PyTorch loss calculated?

After the loss is calculated using loss = criterion(outputs, labels) , the running loss is calculated using running_loss += loss. item() * inputs. size(0) and finally, the epoch loss is calculated using running_loss / dataset_sizes[phase] .


2 Answers

Basically you are missing a concept of batch.

Long story short, every input to loss (and the one passed through the network) requires batch dimension (i.e. how many samples are used).

Breaking it up, step by step:

Your example vs documentation

Each step will be each step compared to make it clearer (documentation on top, your example below)

Inputs

input = torch.randn(3, 5, requires_grad=True)
input = torch.randn(5, requires_grad=True)

In the first case (docs), input with 5 features is created and 3 samples are used. In your case there is only batch dimension (5 samples), you have no features which are required. If you meant to have one sample with 5 features you should do:

input = torch.randn(5, requires_grad=True)

LogSoftmax

LogSoftmax is done across features dimension, you are doing it across batch.

m = nn.LogSoftmax(dim=1) # apply over features m = nn.LogSoftmax(dim=0) # apply over batch

It makes no sense usually for this operation as samples are independent of each other.

Targets

As this is multiclass classification and each element in vector represents a sample, one can pass as many numbers as one wants (as long as it's smaller than number of features, in case of documentation example it's 5, hence [0-4] is fine ).

train = torch.tensor([1, 0, 4])
train = torch.tensor([1, 0, 0])

I assume, you wanted to pass one-hot vector as target as well. PyTorch doesn't work that way as it's memory inefficient (why store everything as one-hot encoded when you can just pinpoint exactly the class, in your case it would be 0).

Only outputs of neural network are one hot encoded in order to backpropagate error through all output nodes, it's not needed for targets.

Final

You shouldn't use torch.nn.LogSoftmax at all for this task. Just use torch.nn.Linear as last layer and use torch.nn.CrossEntropyLoss with your targets.

like image 178
Szymon Maszke Avatar answered Oct 22 '22 04:10

Szymon Maszke


I agree with you that the documentation for nn.NLLLoss() is far from ideal, but I think we can clarify your problem here, firstly, by clarifying that "class" is often used as a synonym of "category" in a Machine Learning context.

Therefore, when PyTorch is talking about C classes, it is actually referring to the number of distinct categories that you are trying to train your network on. So, in the classical example of a categorical neural network trying to classify between "cats" and "dogs", C = 2, since it is either a cat or dog.

Specifically for this classification problem, it also holds that we only have one single truth value over the array of our categories (a picture cannot depict both a cat AND a dog, but always only either one), which is why we can conveniently indicate the corresponding category of an image by its index (let's say that 0 would indicate a cat, and 1 a dog). Now, we can simply compare the network output to the category we want.

BUT, in order for this to work, we need to also be clear what these loss values are referencing to (in our network output), since our network will generally make predictions via a softmax over different output neurons, meaning that we have generally more than a single value. Fortunately, PyTorch's nn.NLLLoss does this automatically for you.

Your above example with the LogSoftmax in fact only produces a single output value, which is a critical case for this example. This way, you basically only have an indication of whether or not something exists/doesn't exist, but it doesn't make much sense to use in a classification example, more so in a regression case (but that would require a totally different loss function to begin with).

Last, but not least, you should also consider the fact that we generally have 2D tensors as input, since batching (the simultaneous computation of multiple samples) is generally considered a necessary step to match performance. Even if you choose a batch size of 1, this still requires your inputs to be of dimension (batch_size, input_dimensions), and consequently your output tensors of shape (batch_size, number_of_categories).

This explains why most of the examples you find online are performing the LogSoftmax() over dim=1, since this is the "in-distribution axis", and not the batch axis (which would be dim=0).

If you simply want to fix your problem, the easiest way would be to extend your random tensor by an additional dimension (torch.randn([1, 5], requires_grad=True)), and then to compare by only one value in your output tensor (print(loss(output, torch.tensor([1])))

like image 44
dennlinger Avatar answered Oct 22 '22 03:10

dennlinger