I'm asking about C classes for a NLLLoss loss function. The documentation states: <blockquote> The negative log likelihood loss. It is useful to train a classification problem with C classes. </blockquote> Basically everything after that point depends upon you knowing what a C class is, and I thought I knew what a C class was but the documentation doesn't make much sense to me. Especially when it describes the expected inputs of <code>(N, C) where C = number of classes</code>. That's where I'm confused, because I thought a C class refers to the output only. My understanding was that the C class was a one hot vector of classifications. I've often found in tutorials that the <code>NLLLoss</code> was often paired with a <code>LogSoftmax</code> to solve classification problems. I was expecting to use <code>NLLLoss</code> in the following example: <pre class="prettyprint lang-py prettyprint-override"><code># Some random training data input = torch.randn(5, requires_grad=True) print(input) # tensor([-1.3533, -1.3074, -1.7906, 0.3113, 0.7982], requires_grad=True) # Build my NN (here it's just a LogSoftmax) m = nn.LogSoftmax(dim=0) # Train my NN with the data output = m(input) print(output) # tensor([-2.8079, -2.7619, -3.2451, -1.1432, -0.6564], grad_fn=<LogSoftmaxBackward>) loss = nn.NLLLoss() print(loss(output, torch.tensor([1, 0, 0]))) </code></pre> The above raises the following error on the last line: <blockquote> ValueError: Expected 2 or more dimensions (got 1) </blockquote> We can ignore the error, because clearly I don't understand what I'm doing. Here I'll explain my intentions of the above source code. <pre class="prettyprint lang-py prettyprint-override"><code>input = torch.randn(5, requires_grad=True) </code></pre> Random 1D array to pair with one hot vector of <code>[1, 0, 0]</code> for training. I'm trying to do a binary bits to one hot vector of decimal numbers. <pre class="prettyprint lang-py prettyprint-override"><code>m = nn.LogSoftmax(dim=0) </code></pre> The documentation for <code>LogSoftmax</code> says that the output will be the same shape as the input, but I've only seen examples of <code>LogSoftmax(dim=1)</code> and therefore I've been stuck trying to make this work because I can't find a relative example. <pre class="prettyprint lang-py prettyprint-override"><code>print(loss(output, torch.tensor([1, 0, 0]))) </code></pre> So now I have the output of the NN, and I want to know the loss from my classification <code>[1, 0, 0]</code>. It doesn't really matter in this example what any of the data is. I just want a loss for a one hot vector that represents classification. At this point I get stuck trying to resolve errors from the loss function relating to expected output and input structures. I've tried using <code>view(...)</code> on the output and input to fix the shape, but that just gets me other errors. So this goes back to my original question and I'll show the example from the documentation to explain my confusion: <pre class="prettyprint lang-py prettyprint-override"><code>m = nn.LogSoftmax(dim=1) loss = nn.NLLLoss() input = torch.randn(3, 5, requires_grad=True) train = torch.tensor([1, 0, 4]) print('input', input) # input tensor([[...],[...],[...]], requires_grad=True) output = m(input) print('train', output, train) # tensor([[...],[...],[...]],grad_fn=<LogSoftmaxBackward>) tensor([1, 0, 4]) x = loss(output, train) </code></pre> Again, we have <code>dim=1</code> on <code>LogSoftmax</code> which confuses me now, because look at the <code>input</code> data. It's a <code>3x5</code> tensor and I'm lost. Here's the documentation on the first input for the <code>NLLLoss</code> function: <blockquote> Input: (N, C)(N,C) where C = number of classes </blockquote> The inputs are grouped by the number of classes? So each row of the tensor input is associated with each element of the training tensor? If I change the second dimension of the input tensor, then nothing breaks and I don't understand what is going on. <pre class="prettyprint lang-py prettyprint-override"><code>input = torch.randn(3, 100, requires_grad=True) # 3 x 100 still works? </code></pre> So I don't understand what a C class is here, and I thought a C class was a classification (like a label) and meaningful only on the outputs of the NN. I hope you understand my confusion, because shouldn't the shape of the inputs for the NN be independent from the shape of the one hot vector used for classification? Both the code examples and documentations say that the shape of the inputs is defined by the number of classifications, and I don't really understand why. I have tried to study the documentations and tutorials to understand what I'm missing, but after several days of not being able to get past this point I've decided to ask this question. It's been humbling because I thought this was going to be one of the easier things to learn.

Basically you are missing a concept of <code>batch</code>. Long story short, every input to loss (and the one passed through the network) requires <code>batch</code> dimension (i.e. how many samples are used). Breaking it up, step by step: <h3>Your example vs documentation</h3> Each step will be each step compared to make it clearer (documentation on top, your example below) <h3>Inputs</h3> <pre class="prettyprint"><code>input = torch.randn(3, 5, requires_grad=True) input = torch.randn(5, requires_grad=True) </code></pre> In the first case (docs), input with <code>5</code> features is created and <code>3</code> samples are used. In your case there is only <code>batch</code> dimension (<code>5</code> samples), you have no features which are required. If you meant to have one sample with <code>5</code> features you should do: <pre class="prettyprint"><code>input = torch.randn(5, requires_grad=True) </code></pre> <h3>LogSoftmax</h3> <code>LogSoftmax</code> is done across features dimension, you are doing it across batch. m = nn.LogSoftmax(dim=1) # apply over features m = nn.LogSoftmax(dim=0) # apply over batch It makes no sense usually for this operation as samples are independent of each other. <h3>Targets</h3> As this is multiclass classification and each element in vector represents a sample, one can pass as many numbers as one wants (as long as it's smaller than number of features, in case of documentation example it's <code>5</code>, hence <code>[0-4]</code> is fine ). <pre class="prettyprint"><code>train = torch.tensor([1, 0, 4]) train = torch.tensor([1, 0, 0]) </code></pre> I assume, you wanted to pass one-hot vector as target as well. PyTorch doesn't work that way as it's memory inefficient (why store everything as one-hot encoded when you can just pinpoint exactly the class, in your case it would be <code>0</code>). Only outputs of neural network are one hot encoded in order to backpropagate error through all output nodes, it's not needed for targets. <h3>Final</h3> You shouldn't use <code>torch.nn.LogSoftmax</code> at all for this task. Just use <code>torch.nn.Linear</code> as last layer and use <code>torch.nn.CrossEntropyLoss</code> with your targets.

I agree with you that the documentation for <code>nn.NLLLoss()</code> is far from ideal, but I think we can clarify your problem here, firstly, by clarifying that "class" is often used as a synonym of "category" in a Machine Learning context. Therefore, when PyTorch is talking about <code>C</code> classes, it is actually referring to the number of distinct categories that you are trying to train your network on. So, in the classical example of a categorical neural network trying to classify between "cats" and "dogs", <code>C = 2</code>, since it is either a cat or dog. Specifically for this classification problem, it also holds that we only have one single truth value over the array of our categories (a picture cannot depict both a cat AND a dog, but always only either one), which is why we can conveniently indicate the corresponding category of an image by its index (let's say that <code>0</code> would indicate a cat, and <code>1</code> a dog). Now, we can simply compare the network output to the category we want. BUT, in order for this to work, we need to also be clear what these loss values are referencing to (in our network output), since our network will generally make predictions via a softmax over different output neurons, meaning that we have generally more than a single value. Fortunately, PyTorch's <code>nn.NLLLoss</code> does this automatically for you. Your above example with the <code>LogSoftmax</code> in fact only produces a single output value, which is a critical case for this example. This way, you basically only have an indication of whether or not something exists/doesn't exist, but it doesn't make much sense to use in a classification example, more so in a regression case (but that would require a totally different loss function to begin with). Last, but not least, you should also consider the fact that we generally have 2D tensors as input, since batching (the simultaneous computation of multiple samples) is generally considered a necessary step to match performance. Even if you choose a batch size of 1, this still requires your inputs to be of dimension <code>(batch_size, input_dimensions)</code>, and consequently your output tensors of shape <code>(batch_size, number_of_categories)</code>. This explains why most of the examples you find online are performing the <code>LogSoftmax()</code> over <code>dim=1</code>, since this is the "in-distribution axis", and not the batch axis (which would be <code>dim=0</code>). If you simply want to fix your problem, the easiest way would be to extend your random tensor by an additional dimension (<code>torch.randn([1, 5], requires_grad=True)</code>), and then to compare by only one value in your output tensor (<code>print(loss(output, torch.tensor([1]))</code>)

What are C classes for a NLLLoss loss function in Pytorch?

Tags:

python

machine-learning

neural-network

pytorch

I'm asking about C classes for a NLLLoss loss function.

The documentation states:

The negative log likelihood loss. It is useful to train a classification problem with C classes.

Basically everything after that point depends upon you knowing what a C class is, and I thought I knew what a C class was but the documentation doesn't make much sense to me. Especially when it describes the expected inputs of (N, C) where C = number of classes. That's where I'm confused, because I thought a C class refers to the output only. My understanding was that the C class was a one hot vector of classifications. I've often found in tutorials that the NLLLoss was often paired with a LogSoftmax to solve classification problems.

I was expecting to use NLLLoss in the following example:

# Some random training data
input = torch.randn(5, requires_grad=True)
print(input)  # tensor([-1.3533, -1.3074, -1.7906,  0.3113,  0.7982], requires_grad=True)
# Build my NN (here it's just a LogSoftmax)
m = nn.LogSoftmax(dim=0)
# Train my NN with the data
output = m(input)
print(output)  # tensor([-2.8079, -2.7619, -3.2451, -1.1432, -0.6564], grad_fn=<LogSoftmaxBackward>)
loss = nn.NLLLoss()
print(loss(output, torch.tensor([1, 0, 0])))

The above raises the following error on the last line:

ValueError: Expected 2 or more dimensions (got 1)

We can ignore the error, because clearly I don't understand what I'm doing. Here I'll explain my intentions of the above source code.

input = torch.randn(5, requires_grad=True)

Random 1D array to pair with one hot vector of [1, 0, 0] for training. I'm trying to do a binary bits to one hot vector of decimal numbers.

m = nn.LogSoftmax(dim=0)

The documentation for LogSoftmax says that the output will be the same shape as the input, but I've only seen examples of LogSoftmax(dim=1) and therefore I've been stuck trying to make this work because I can't find a relative example.

print(loss(output, torch.tensor([1, 0, 0])))

So now I have the output of the NN, and I want to know the loss from my classification [1, 0, 0]. It doesn't really matter in this example what any of the data is. I just want a loss for a one hot vector that represents classification.

At this point I get stuck trying to resolve errors from the loss function relating to expected output and input structures. I've tried using view(...) on the output and input to fix the shape, but that just gets me other errors.

So this goes back to my original question and I'll show the example from the documentation to explain my confusion:

m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
input = torch.randn(3, 5, requires_grad=True)
train = torch.tensor([1, 0, 4])
print('input', input)  # input tensor([[...],[...],[...]], requires_grad=True)
output = m(input)
print('train', output, train)  # tensor([[...],[...],[...]],grad_fn=<LogSoftmaxBackward>) tensor([1, 0, 4])
x = loss(output, train)

Again, we have dim=1 on LogSoftmax which confuses me now, because look at the input data. It's a 3x5 tensor and I'm lost.

Here's the documentation on the first input for the NLLLoss function:

Input: (N, C)(N,C) where C = number of classes

The inputs are grouped by the number of classes?

So each row of the tensor input is associated with each element of the training tensor?

If I change the second dimension of the input tensor, then nothing breaks and I don't understand what is going on.

input = torch.randn(3, 100, requires_grad=True)
# 3 x 100 still works?

So I don't understand what a C class is here, and I thought a C class was a classification (like a label) and meaningful only on the outputs of the NN.

I hope you understand my confusion, because shouldn't the shape of the inputs for the NN be independent from the shape of the one hot vector used for classification?

Both the code examples and documentations say that the shape of the inputs is defined by the number of classifications, and I don't really understand why.

I have tried to study the documentations and tutorials to understand what I'm missing, but after several days of not being able to get past this point I've decided to ask this question. It's been humbling because I thought this was going to be one of the easier things to learn.

274

asked Jan 13 '20 14:01

Reactgular

2 Answers

Basically you are missing a concept of batch.

Long story short, every input to loss (and the one passed through the network) requires batch dimension (i.e. how many samples are used).

Breaking it up, step by step:

Your example vs documentation

Each step will be each step compared to make it clearer (documentation on top, your example below)

Inputs

input = torch.randn(3, 5, requires_grad=True)
input = torch.randn(5, requires_grad=True)

In the first case (docs), input with 5 features is created and 3 samples are used. In your case there is only batch dimension (5 samples), you have no features which are required. If you meant to have one sample with 5 features you should do:

input = torch.randn(5, requires_grad=True)

LogSoftmax

LogSoftmax is done across features dimension, you are doing it across batch.

m = nn.LogSoftmax(dim=1) # apply over features m = nn.LogSoftmax(dim=0) # apply over batch

It makes no sense usually for this operation as samples are independent of each other.

Targets

As this is multiclass classification and each element in vector represents a sample, one can pass as many numbers as one wants (as long as it's smaller than number of features, in case of documentation example it's 5, hence [0-4] is fine ).

train = torch.tensor([1, 0, 4])
train = torch.tensor([1, 0, 0])

I assume, you wanted to pass one-hot vector as target as well. PyTorch doesn't work that way as it's memory inefficient (why store everything as one-hot encoded when you can just pinpoint exactly the class, in your case it would be 0).

Only outputs of neural network are one hot encoded in order to backpropagate error through all output nodes, it's not needed for targets.

Final

You shouldn't use torch.nn.LogSoftmax at all for this task. Just use torch.nn.Linear as last layer and use torch.nn.CrossEntropyLoss with your targets.

178

answered Oct 22 '22 04:10

Szymon Maszke

I agree with you that the documentation for nn.NLLLoss() is far from ideal, but I think we can clarify your problem here, firstly, by clarifying that "class" is often used as a synonym of "category" in a Machine Learning context.

Therefore, when PyTorch is talking about C classes, it is actually referring to the number of distinct categories that you are trying to train your network on. So, in the classical example of a categorical neural network trying to classify between "cats" and "dogs", C = 2, since it is either a cat or dog.

Specifically for this classification problem, it also holds that we only have one single truth value over the array of our categories (a picture cannot depict both a cat AND a dog, but always only either one), which is why we can conveniently indicate the corresponding category of an image by its index (let's say that 0 would indicate a cat, and 1 a dog). Now, we can simply compare the network output to the category we want.

BUT, in order for this to work, we need to also be clear what these loss values are referencing to (in our network output), since our network will generally make predictions via a softmax over different output neurons, meaning that we have generally more than a single value. Fortunately, PyTorch's nn.NLLLoss does this automatically for you.

Your above example with the LogSoftmax in fact only produces a single output value, which is a critical case for this example. This way, you basically only have an indication of whether or not something exists/doesn't exist, but it doesn't make much sense to use in a classification example, more so in a regression case (but that would require a totally different loss function to begin with).

Last, but not least, you should also consider the fact that we generally have 2D tensors as input, since batching (the simultaneous computation of multiple samples) is generally considered a necessary step to match performance. Even if you choose a batch size of 1, this still requires your inputs to be of dimension (batch_size, input_dimensions), and consequently your output tensors of shape (batch_size, number_of_categories).

This explains why most of the examples you find online are performing the LogSoftmax() over dim=1, since this is the "in-distribution axis", and not the batch axis (which would be dim=0).

If you simply want to fix your problem, the easiest way would be to extend your random tensor by an additional dimension (torch.randn([1, 5], requires_grad=True)), and then to compare by only one value in your output tensor (print(loss(output, torch.tensor([1])))

answered Oct 22 '22 03:10

dennlinger

Related questions
                            
                                what is diffrence between number and repeat in python timeit?
                            
                                How to horizontally swap two halves of an image in python opencv
                            
                                Best Practices Python - Where to store API KEYS/TOKENS
                            
                                Fast combination of non-unique rows in numpy array, mapped to columns (i.e. fast pivot table problem, without Pandas)
                            
                                Wasserstein loss can be negative?
                            
                                Debug the CPython opcode stack
                            
                                Group recursively adjacent tuples from a list in Python
                            
                                Django/Webpack - How to serve generated webpack bundles with webpack dev server
                            
                                OverflowError: Python int too large to convert to C long torchtext.datasets.text_classification.DATASETS['AG_NEWS']()
                            
                                How to shuffle two numpy datasets using TensorFlow 2.0?
                            
                                Logger hierarchy and the root logger when logging with multiple modules
                            
                                How to get value of a Keras tensor in TensorFlow 2?
                            
                                ctypes vs _ctypes - why does the latter exist?
                            
                                Handling very very small numbers in Python
                            
                                VSCode on discover tests Error: spawn python ENOENT
                            
                                Airflow - Disable heartbeat logs
                            
                                "Can't find starting number (in the name of file)" when trying to read frames from hevc (h265) video in opencv
                            
                                How to perform assignment destructuring using the walrus operator in Python
                            
                                Pandas group by result to columns
                            
                                How to retrieve all the content of calls made to a mock?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With