I'm trying to run the PyTorch tutorial on CIFAR10 image classification here - http://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py
I've made a small change and I'm using a different dataset. I have images from the Wikiart dataset that I want to classify by artist (label = artist name).
Here is the code for the Net -
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16*5*5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
Then there is this section of the code where I start training the Net.
for epoch in range(2):
running_loss = 0.0
for i, data in enumerate(wiki_train_dataloader, 0):
inputs, labels = data['image'], data['class']
print(inputs.shape)
inputs, labels = Variable(inputs), Variable(labels)
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.data[0]
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
This line print(inputs.shape)
gives me torch.Size([4, 32, 32, 3])
with my Wikiart dataset whereas in the original example with CIFAR10, it prints torch.Size([4, 3, 32, 32])
.
Now, I'm not sure how to change the Conv2d in my Net to be compatible with torch.Size([4, 32, 32, 3])
.
I get this error:
RuntimeError: Given input size: (3 x 32 x 3). Calculated output size: (6 x 28 x -1). Output size is too small at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THNN/generic/SpatialConvolutionMM.c:45
While reading the images for the Wikiart dataset, I resize them to (32, 32) and these are 3-channel images.
Things I tried:
1) The CIFAR10 tutorial uses a transform which I am not using. I could not incorporate the same into my code.
2) Changing self.conv2 = nn.Conv2d(6, 16, 5)
to self.conv2 = nn.Conv2d(3, 6, 5)
. This gave me the same error as above. I was only changing this to see if the error message changes.
Any resources on how to calculate input & output sizes in PyTorch or automatically reshape Tensors would be really appreciated. I just started learning Torch & I find the size calculations complicated.
You have to shape your input to this format (Batch, Number Channels, height, width). Currently you have format (B,H,W,C) (4, 32, 32, 3), so you need to swap 4th and 2nd axis to shape your data with (B,C,H,W). You can do it this way:
inputs, labels = Variable(inputs), Variable(labels)
inputs = inputs.transpose(1,3)
... the rest
I know it is an old question, but I stumbled upon this again when working with non-standard kernel sizes, dilations, etc. Here is a function I came up with, which does the calculation for me and checks for a given output shape:
def find_settings(shape_in, shape_out, kernel_sizes, dilation_sizes, padding_sizes, stride_sizes, transpose=False):
from itertools import product
import torch
from torch import nn
import numpy as np
# Fake input
x_in = torch.tensor(np.random.randn(4, 1, shape_in, shape_in), dtype=torch.float)
# Grid search through all combinations
for kernel, dilation, padding, stride in product(kernel_sizes, dilation_sizes, padding_sizes, stride_sizes):
# Define a layer
if transpose:
layer = nn.ConvTranspose2d
else:
layer = nn.Conv2d
layer = layer(
1, 1,
(4, kernel),
stride=(2, stride),
padding=(2, padding),
dilation=(2, dilation)
)
# Check if layer is valid for given input shape
try:
x_out = layer(x_in)
except Exception:
continue
# Check for shape of out tensor
result = x_out.shape[-1]
if shape_out == result:
print('Correct shape for:\n ker: {}\n dil: {}\n pad: {}\n str: {}\n'.format(kernel, dilation, padding, stride))
Here is an example usage of it:
transpose = True
shape_in = 128
shape_out = 1024
kernel_sizes = [3, 4, 5, 7, 9, 11]
dilation_sizes = list(range(1, 20))
padding_sizes = list(range(15))
stride_sizes = list(range(4, 16))
find_settings(shape_in, shape_out, kernel_sizes, dilation_sizes, padding_sizes, stride_sizes, transpose)
I hope it can help people in the future with this problem. Note that it's not parallelized, and if given a lot of choices it can run for a while.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With