In the PyTorch tutorial, the constructed network is <pre class="prettyprint"><code>Net( (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1)) (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1)) (fc1): Linear(in_features=400, out_features=120, bias=True) (fc2): Linear(in_features=120, out_features=84, bias=True) (fc3): Linear(in_features=84, out_features=10, bias=True) ) </code></pre> And used to process images with dimensions <code>1x32x32</code>. They mention, that the network cannot be used with images with a different size. The two convolutional layers seem to allow for an arbitrary number of features, so the linear layers seem to be related to getting the <code>32x32</code> into into <code>10</code> final features. I do not really understand, how the numbers <code>120</code> and <code>84</code> are chosen there and why the result matches with the input dimensions. And when I try to construct a similar network, I actually get the problem with the dimension of the data. When I for example use a simpler network: <pre class="prettyprint"><code>Net( (conv1): Conv2d(3, 8, kernel_size=(5, 5), stride=(1, 1)) (conv2): Conv2d(8, 16, kernel_size=(5, 5), stride=(1, 1)) (fc1): Linear(in_features=400, out_features=3, bias=True) ) </code></pre> for an input of the size <code>3x1200x800</code>, I get the error message: <pre class="prettyprint"><code>RuntimeError: size mismatch, m1: [1 x 936144], m2: [400 x 3] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:940 </code></pre> Where does the number <code>936144</code> come from and how do I need to design the network, such that the dimensions are matching?

The key step is between the last convolution and the first <code>Linear</code> block. <code>Conv2d</code> outputs a tensor of shape <code>[batch_size, n_features_conv, height, width]</code> whereas <code>Linear</code> expects <code>[batch_size, n_features_lin]</code>. To make the two align you need to "stack" the 3 dimensions <code>[n_features_conv, height, width]</code> into one <code>[n_features_lin]</code>. As follows, it must be that <code>n_features_lin == n_features_conv * height * width</code>. In the original code this "stacking" is achieved by <pre class="prettyprint"><code>x = x.view(-1, self.num_flat_features(x)) </code></pre> and if you inspect <code>num_flat_features</code> it just computes this <code>n_features_conv * height * width</code> product. In other words, your first conv must have <code>num_flat_features(x)</code> input features, where <code>x</code> is the tensor retrieved from the preceding convolution. But we need to calculate this value ahead of time, so that we can initialize the network in the first place... The calculation follows from inspecting the operations one by one. <ol> <li>input is 32x32</li> <li>we do a 5x5 convolution without padding, so we lose 2 pixels at each side, we drop down to 28x28</li> <li>we do maxpooling with receptive field of 2x2, we cut each dimension by half, down to 14x14</li> <li>we do another 5x5 convolution without padding, we drop down to 10x10</li> <li>we do another maxpooling, we drop down to 5x5</li> </ol> and this 5x5 is why in the tutorial you see <code>self.fc1 = nn.Linear(16 * 5 * 5, 120)</code>. It's <code>n_features_conv * height * width</code>, when starting from a 32x32 image. If you want to have a different input size, you have to redo the above calculation and adjust your first <code>Linear</code> layer accordingly. For the further operations, it's just a chain of matrix multiplications (that's what <code>Linear</code> does). So the only rule is that the <code>n_features_out</code> of previous <code>Linear</code> matches <code>n_features_in</code> of the next one. Values 120 and 84 are entirely arbitrary, though they were probably chosen by the author such that the resulting network performs well.

How are the pytorch dimensions for linear layers calculated?

Tags:

pytorch

In the PyTorch tutorial, the constructed network is

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

And used to process images with dimensions 1x32x32. They mention, that the network cannot be used with images with a different size.

The two convolutional layers seem to allow for an arbitrary number of features, so the linear layers seem to be related to getting the 32x32 into into 10 final features.

I do not really understand, how the numbers 120 and 84 are chosen there and why the result matches with the input dimensions.

And when I try to construct a similar network, I actually get the problem with the dimension of the data.

When I for example use a simpler network:

Net(
  (conv1): Conv2d(3, 8, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(8, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=3, bias=True)
)

for an input of the size 3x1200x800, I get the error message:

RuntimeError: size mismatch, m1: [1 x 936144], m2: [400 x 3] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:940

Where does the number 936144 come from and how do I need to design the network, such that the dimensions are matching?

559

asked Dec 14 '18 18:12

allo

1 Answers

The key step is between the last convolution and the first Linear block. Conv2d outputs a tensor of shape [batch_size, n_features_conv, height, width] whereas Linear expects [batch_size, n_features_lin]. To make the two align you need to "stack" the 3 dimensions [n_features_conv, height, width] into one [n_features_lin]. As follows, it must be that n_features_lin == n_features_conv * height * width. In the original code this "stacking" is achieved by

x = x.view(-1, self.num_flat_features(x))

and if you inspect num_flat_features it just computes this n_features_conv * height * width product. In other words, your first conv must have num_flat_features(x) input features, where x is the tensor retrieved from the preceding convolution. But we need to calculate this value ahead of time, so that we can initialize the network in the first place...

The calculation follows from inspecting the operations one by one.

input is 32x32
we do a 5x5 convolution without padding, so we lose 2 pixels at each side, we drop down to 28x28
we do maxpooling with receptive field of 2x2, we cut each dimension by half, down to 14x14
we do another 5x5 convolution without padding, we drop down to 10x10
we do another maxpooling, we drop down to 5x5

and this 5x5 is why in the tutorial you see self.fc1 = nn.Linear(16 * 5 * 5, 120). It's n_features_conv * height * width, when starting from a 32x32 image. If you want to have a different input size, you have to redo the above calculation and adjust your first Linear layer accordingly.

For the further operations, it's just a chain of matrix multiplications (that's what Linear does). So the only rule is that the n_features_out of previous Linear matches n_features_in of the next one. Values 120 and 84 are entirely arbitrary, though they were probably chosen by the author such that the resulting network performs well.

176

answered Oct 24 '22 22:10

Jatentaki

Related questions
                            
                                What is tape-based autograd in Pytorch?
                            
                                PyTorch LogSoftmax vs Softmax for CrossEntropyLoss
                            
                                What is the difference between MLP implementation from scratch and in PyTorch?
                            
                                AttributeError: module 'torch' has no attribute '_six'. Bert model in Pytorch
                            
                                what the difference between att_mask and key_padding_mask in MultiHeadAttnetion
                            
                                How to count the amount of layers in a CNN?
                            
                                Installing PyTorch via Conda
                            
                                ‘DataParallel’ object has no attribute ‘init_hidden’
                            
                                from torch._C import * ImportError: DLL load failed: The specified module could not be found
                            
                                RuntimeError: size mismatch m1: [a x b], m2: [c x d]
                            
                                Custom weight initialization in PyTorch
                            
                                PyTorch mapping operators to functions
                            
                                Different methods for initializing embedding layer weights in Pytorch
                            
                                PyTorch: Learning rate scheduler
                            
                                How to construct a 3D Tensor where every 2D sub tensor is a diagonal matrix in PyTorch?
                            
                                How can I download and skip VGG weights that have no counterpart with my CNN in Keras?
                            
                                Is there any pytorch function can combine the specific continuous dimensions of tensor into one?
                            
                                How to resize a PyTorch tensor?
                            
                                Pytorch: RuntimeError: reduce failed to synchronize: cudaErrorAssert: device-side assert triggered
                            
                                tensorflow stop_gradient equivalent in pytorch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With