Running through a dataloader in Pytorch using Google Colab

Tags:

I am trying to use Pytorch to run classification on a dataset of images of cats and dogs. In my code I am so far downloading the data and going into the folder train which has two folders in it called "cats" and "dogs." I am then trying to load this data into a dataloader and iterate through batches, but it is giving me some error I don't understand in the iteration step.

Since it is Google Colabs I have code in there for downloading data and installing libraries. Any other advice on my code so far would be appreciated as well.

!pip install torch
!pip install torchvision

from __future__ import print_function, division
import os
import torch
import pandas as pd
import numpy as np
# For showing and formatting images
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

# For importing datasets into pytorch
import torchvision.datasets as dataset

# Used for dataloaders
import torch.utils.data as data

# For pretrained resnet34 model
import torchvision.models as models

# For optimisation function
import torch.nn as nn
import torch.optim as optim


!wget http://files.fast.ai/data/dogscats.zip
!unzip dogscats.zip    

batch_size = 256

train_raw = dataset.ImageFolder(PATH+"train", transform=transforms.ToTensor())
train_loader = data.DataLoader(train_raw, batch_size=batch_size, shuffle=True)

for batch_idx, (data, target) in enumerate(train_loader):
  print("Data: ", batch_idx)

The error comes up on the last lines and is below:

RuntimeErrorTraceback (most recent call last)
<ipython-input-66-c32dd0c1b880> in <module>()
----> 1 for batch_idx, (data, target) in enumerate(train_loader):
      2   print("Data: ", batch_idx)
      3 

/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.pyc in __next__(self)
    257         if self.num_workers == 0:  # same-process loading
    258             indices = next(self.sample_iter)  # may raise StopIteration
--> 259             batch = self.collate_fn([self.dataset[i] for i in indices])
    260             if self.pin_memory:
    261                 batch = pin_memory_batch(batch)

/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.pyc in default_collate(batch)
    133     elif isinstance(batch[0], collections.Sequence):
    134         transposed = zip(*batch)
--> 135         return [default_collate(samples) for samples in transposed]
    136 
    137     raise TypeError((error_msg.format(type(batch[0]))))

/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.pyc in default_collate(batch)
    110             storage = batch[0].storage()._new_shared(numel)
    111             out = batch[0].new(storage)
--> 112         return torch.stack(batch, 0, out=out)
    113     elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
    114             and elem_type.__name__ != 'string_':

/usr/local/lib/python2.7/dist-packages/torch/functional.pyc in stack(sequence, dim, out)
     62     inputs = [t.unsqueeze(dim) for t in sequence]
     63     if out is None:
---> 64         return torch.cat(inputs, dim)
     65     else:
     66         return torch.cat(inputs, dim, out=out)

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 400 and 487 in dimension 2 at /pytorch/torch/lib/TH/generic/THTensorMath.c:2897

Thanks

802

asked Apr 17 '18 12:04

Christopher Ell

1 Answers

I think the main problem was images being of different size . I may have understood ImageFolder in other way but, i think you don't need labels for images if the directory structure is as specified in pytorch and pytorch will figure out the labels for you. I would also add more things to your transform that automatically resizes every images from the folder such as:

   normalize = transforms.Normalize(
                        mean=[0.485, 0.456, 0.406],
                        std=[0.229, 0.224, 0.225]
                        )
   transform = transforms.Compose(
        [transforms.ToTensor(),transforms.Resize((224,224)),
         normalize])

Also you can use other tricks to make your DataLoader much faster such as adding batch_size and number of cpu workers such as:

    testloader = DataLoader(testset, batch_size=16,
                         shuffle=False, num_workers=4)

I think this will make you pipeline much faster.

answered Sep 28 '22 16:09

macharya

Related questions
                            
                                Serial Port not being flushed properly
                            
                                How to set a timeout for Input
                            
                                Selenium3.4.0-Python3.6.1 : In Selenium-Python binding using unittest how do I decide when to use self.assertIn or assert
                            
                                Authentication OneDrive Python API
                            
                                Python Nose2 Tests Not Finishing When Class Method Called
                            
                                how to force timeout on python's request library (including DNS lookup)
                            
                                What is the efficient way to check two memoryviews in loop?
                            
                                Python Tkinter: Binding Keypress Event to Active Tab in ttk.Notebook
                            
                                Reading stdin in Spyder
                            
                                Why pandas.read_sql returns an empty DataFrame?
                            
                                Python can import a module that isn't installed
                            
                                Pandas Timestamp index to list of date strings
                            
                                Iterable object and Django StreamingHttpResponse
                            
                                Dask prints warning to use client.scatter althought I'm using the suggested approach
                            
                                TypeError: can't pickle memoryview objects when running basic add.delay(1,2) test
                            
                                Memory Error in Python 3 and Windows 64
                            
                                How to create video and voice call to python application?
                            
                                Color Correction Matrix in LAB Color Space - OpenCV
                            
                                when to use min-max-scalar and standard-scalar
                            
                                Tensorflow-gpu with pyinstaller

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Running through a dataloader in Pytorch using Google Colab

Tags:

python-3.x

image

deep-learning

pytorch

image-recognition

Christopher Ell

People also ask

1 Answers

macharya

Recent Activity

Donate For Us