I'm a newbie trying to make this PyTorch CNN work with the Cats&Dogs dataset from kaggle. As there are no targets for the test images, I manually classified some of the test images and put the class in the filename, to be able to test (maybe should have just used some of the train images).
I used the torchvision.datasets.ImageFolder class to load the train and test images. The training seems to work.
But what do I need to do to make the test-routine work? I don't know, how to connect my test_data_loader with the test loop at the bottom, via test_x and test_y.
The Code is based on this MNIST example CNN. There, something like this is used right after the loaders are created. But I failed to rewrite it for my dataset:
test_x = Variable(torch.unsqueeze(test_data.test_data, dim=1), volatile=True).type(torch.FloatTensor)[:2000]/255. # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)
test_y = test_data.test_labels[:2000]
The Code:
import os
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.utils.data as data
import torchvision
from torchvision import transforms
EPOCHS = 2
BATCH_SIZE = 10
LEARNING_RATE = 0.003
TRAIN_DATA_PATH = "./train_cl/"
TEST_DATA_PATH = "./test_named_cl/"
TRANSFORM_IMG = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(256),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225] )
])
train_data = torchvision.datasets.ImageFolder(root=TRAIN_DATA_PATH, transform=TRANSFORM_IMG)
train_data_loader = data.DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=4)
test_data = torchvision.datasets.ImageFolder(root=TEST_DATA_PATH, transform=TRANSFORM_IMG)
test_data_loader = data.DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=4)
class CNN(nn.Module):
# omitted...
if __name__ == '__main__':
print("Number of train samples: ", len(train_data))
print("Number of test samples: ", len(test_data))
print("Detected Classes are: ", train_data.class_to_idx) # classes are detected by folder structure
model = CNN()
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)
loss_func = nn.CrossEntropyLoss()
# Training and Testing
for epoch in range(EPOCHS):
for step, (x, y) in enumerate(train_data_loader):
b_x = Variable(x) # batch x (image)
b_y = Variable(y) # batch y (target)
output = model(b_x)[0]
loss = loss_func(output, b_y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Test -> this is where I have no clue
if step % 50 == 0:
test_x = Variable(test_data_loader)
test_output, last_layer = model(test_x)
pred_y = torch.max(test_output, 1)[1].data.squeeze()
accuracy = sum(pred_y == test_y) / float(test_y.size(0))
print('Epoch: ', epoch, '| train loss: %.4f' % loss.data[0], '| test accuracy: %.2f' % accuracy)
ImageFolder class: Responsible for loading images from train and val folders into a PyTorch dataset. DataLoader class: Enables us to wrap an iterable around our dataset so that data samples can be efficiently accessed by our deep learning model.
Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset to enable easy access to the samples.
Creating a PyTorch Dataset and managing it with Dataloader keeps your data manageable and helps to simplify your machine learning pipeline. a Dataset stores all your data, and Dataloader is can be used to iterate through the data, manage batches, transform the data, and much more.
Looking at the data from Kaggle and your code, it seems that there are problems in your data loading, both train and test set. First of all, the data should be in a different folder per label for the default PyTorch ImageFolder
to load it correctly. In your case, since all the training data is in the same folder, PyTorch is loading it as one class and hence learning seems to be working. You can correct this by using a folder structure like - train/dog
, - train/cat
, - test/dog
, - test/cat
and then passing the train and the test folder to the train and test ImageFolder
respectively. The training code seems fine, just change the folder structure and you should be good. Take a look at the official documentation of ImageFolder which has a similar example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With