Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I load up an image and convert it to a proper tensor for PyTorch?

I'm trying to custom load some image files (JPG files) with some labels and feed them into a convolutional neural network (CNN) in PyTorch following the example here. However, there still seem to be no decent end-to-end tutorials. The problem that I am seeing is the following.

RuntimeError: thnn_conv2d_forward is not implemented for type
torch.ByteTensor

My Dataset looks like the following.

class ImageData(Dataset):
    def __init__(self, width=256, height=256, transform=None):
        self.width = width
        self.height = height
        self.transform = transform
        y, x = get_images() #y is a list of labels, x is a list of file paths
        self.y = y
        self.x = x

    def __getitem__(self, index):
        img = Image.open(self.x[index]) # use pillow to open a file
        img = img.resize((self.width, self.height)) # resize the file to 256x256
        img = img.convert('RGB') #convert image to RGB channel
        if self.transform is not None:
            img = self.transform(img)

        img = np.asarray(img).transpose(-1, 0, 1) # we have to change the dimensions from width x height x channel (WHC) to channel x width x height (CWH)
        img = torch.from_numpy(np.asarray(img)) # create the image tensor
        label = torch.from_numpy(np.asarray(self.y[index]).reshape([1, 1])) # create the label tensor
        return img, label

    def __len__(self):
        return len(self.x)

The CNN is taken from here and is modified to handle NCWH (batch x channel x width x height) as follows.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 256, 256)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

The learning loop is also taken from the same tutorial and looks like the following.

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(dataloader, 0):
        # get the inputs
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

However, the RuntimeError mentioned above is thrown. Any ideas on what I am doing wrong?

Additionally, I know that without transposing the image data, it is in the shape of WHC, but the NN model requires it as CWH. The problem is this, if we change from WHC to CWH, then we can no longer simply plot the images if we iterate over the DataLoader.

data = ImageData()
dataloader = DataLoader(data, batch_size=10, shuffle=True, num_workers=1)
imgs, labels = next(iter(dataloader))
plt.imshow(imgs.numpy()[0,:,:,:])
plt.show()

Attempting to do will throw the following error.

TypeError: Invalid dimensions for image data

To me, that Pillow gives you WHC and you can use that to plot, but the PyTorch CNN wants CWH to process, is a nuisance. Any idea on how to consistently or easily not do so many transforms but be able to plot and feed the data into the CNN? Or is this mismatch of WHC vs CWH just something we have to live with?

Without transposing the image, when feeding it to the CNN, the following error is thrown.

RuntimeError: Given groups=1, weight[256, 3, 256, 256], so expected
input[10, 256, 256, 3] to have 3 channels, but got 256 channels

instead.

like image 649
Jane Wayne Avatar asked Nov 07 '22 06:11

Jane Wayne


1 Answers

conv2d operates on float tensors. It's common and good practice to normalize input images before passing them into the neural network.

I would add the line img = img/255 immediately before you convert it to a Torch tensor in __getitem__, then it will be converted to a float tensor rather than a byte tensor and thus will be compatible with the conv2d method.

like image 112
James Avatar answered Nov 14 '22 20:11

James