Calculating input and output size for Conv2d in PyTorch for image classification

Question

I'm trying to run the PyTorch tutorial on CIFAR10 image classification here - http://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py

I've made a small change and I'm using a different dataset. I have images from the Wikiart dataset that I want to classify by artist (label = artist name).

Here is the code for the Net -

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Then there is this section of the code where I start training the Net.

for epoch in range(2):
     running_loss = 0.0

     for i, data in enumerate(wiki_train_dataloader, 0):
        inputs, labels = data['image'], data['class']
        print(inputs.shape)
        inputs, labels = Variable(inputs), Variable(labels)

        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.data[0]
        if i % 2000 == 1999:  # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
              (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

This line print(inputs.shape) gives me torch.Size([4, 32, 32, 3]) with my Wikiart dataset whereas in the original example with CIFAR10, it prints torch.Size([4, 3, 32, 32]).

Now, I'm not sure how to change the Conv2d in my Net to be compatible with torch.Size([4, 32, 32, 3]).

I get this error:

RuntimeError: Given input size: (3 x 32 x 3). Calculated output size: (6 x 28 x -1). Output size is too small at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THNN/generic/SpatialConvolutionMM.c:45

While reading the images for the Wikiart dataset, I resize them to (32, 32) and these are 3-channel images.

Things I tried:

1) The CIFAR10 tutorial uses a transform which I am not using. I could not incorporate the same into my code.

2) Changing self.conv2 = nn.Conv2d(6, 16, 5) to self.conv2 = nn.Conv2d(3, 6, 5). This gave me the same error as above. I was only changing this to see if the error message changes.

Any resources on how to calculate input & output sizes in PyTorch or automatically reshape Tensors would be really appreciated. I just started learning Torch & I find the size calculations complicated.

Egor Lakomkin · Accepted Answer

You have to shape your input to this format (Batch, Number Channels, height, width). Currently you have format (B,H,W,C) (4, 32, 32, 3), so you need to swap 4th and 2nd axis to shape your data with (B,C,H,W). You can do it this way:

inputs, labels = Variable(inputs), Variable(labels)
inputs = inputs.transpose(1,3)
... the rest

balbok · Answer

I know it is an old question, but I stumbled upon this again when working with non-standard kernel sizes, dilations, etc. Here is a function I came up with, which does the calculation for me and checks for a given output shape:

def find_settings(shape_in, shape_out, kernel_sizes, dilation_sizes, padding_sizes, stride_sizes, transpose=False):
    from itertools import product

    import torch
    from torch import nn

    import numpy as np

    # Fake input
    x_in = torch.tensor(np.random.randn(4, 1, shape_in, shape_in), dtype=torch.float)

    # Grid search through all combinations
    for kernel, dilation, padding, stride in product(kernel_sizes, dilation_sizes, padding_sizes, stride_sizes):
        # Define a layer
        if transpose:
            layer = nn.ConvTranspose2d
        else:
            layer = nn.Conv2d
        layer = layer(
                1, 1,
                (4, kernel),
                stride=(2, stride),
                padding=(2, padding),
                dilation=(2, dilation)
            )

        # Check if layer is valid for given input shape
        try:
            x_out = layer(x_in)
        except Exception:
            continue

        # Check for shape of out tensor
        result = x_out.shape[-1]

        if shape_out == result:
            print('Correct shape for:
 ker: {}
 dil: {}
 pad: {}
 str: {}
'.format(kernel, dilation, padding, stride))

Here is an example usage of it:

transpose = True
shape_in = 128
shape_out = 1024


kernel_sizes = [3, 4, 5, 7, 9, 11]
dilation_sizes = list(range(1, 20))
padding_sizes = list(range(15))
stride_sizes = list(range(4, 16))
find_settings(shape_in, shape_out, kernel_sizes, dilation_sizes, padding_sizes, stride_sizes, transpose)

I hope it can help people in the future with this problem. Note that it's not parallelized, and if given a lot of choices it can run for a while.

Calculating input and output size for Conv2d in PyTorch for image classification

Tags:

python

image

tensor

pytorch

convolution

boltthrower

Video Answer

2 Answers

Egor Lakomkin

balbok

Recent Activity

Donate For Us

Calculating input and output size for Conv2d in PyTorch for image classification

Tags:

python

image

tensor

pytorch

convolution

boltthrower

Video Answer

2 Answers

Egor Lakomkin

balbok

Related questions

Recent Activity

Donate For Us