Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it safe to do a ToTensor() data transform before a ColorJitter and RandomHorizotnalFlip and other data augmentation techniques in PyTorch?

I am seeing that the ToTensor is always at the end https://github.com/pytorch/examples/blob/main/imagenet/main.py:

        train_dataset = datasets.ImageFolder(
            traindir,
            transforms.Compose([
                transforms.RandomResizedCrop(224),
                transforms.RandomHorizontalFlip(),
                transforms.ToTensor(),
                normalize,
            ]))

        val_dataset = datasets.ImageFolder(
            valdir,
            transforms.Compose([
                transforms.Resize(256),
                transforms.CenterCrop(224),
                transforms.ToTensor(),
                normalize,
            ]))

other for mini-imagenet:

        train_data_transform = Compose([
            # ToPILImage(),
            RandomCrop(size, padding=8),
            # decided 8 due to https://datascience.stackexchange.com/questions/116201/when-to-use-padding-when-randomly-cropping-images-in-deep-learning
            ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4),
            RandomHorizontalFlip(),
            ToTensor(),
            normalize,
        ])
        test_data_transform = transforms.Compose([
            transforms.Resize((size, size)),
            transforms.ToTensor(),
            transforms.Normalize(mean=mean,
                                 std=std),
        ])
        validation_data_transform = test_data_transform

is there a reason for this? What happens if we do the ToTensor (or ToTensor and Normalize) first?

like image 601
Charlie Parker Avatar asked Oct 18 '25 15:10

Charlie Parker


1 Answers

One reason why people usually use PIL format instead of tensor to perform image preprocessing is related to the resizing algorithms and especially downsizing. As pointed out in this paper,

The Lanczos, bicubic, and bilinear implementations by PIL (top row) adjust the antialiasing filter width by the downsampling factor (marked as ). Other implementations (including those used for PyTorch-FID and TensorFlow-FID) use fixed filter widths, introducing aliasing artifacts and resemble naive nearest subsampling.

image source: clean-fid official github repo

Despite that torchvision.transforms.Resize now supports antialias option, as per the documentation, it is only reducing the gap between the tensor outputs and PIL outputs to some extent. Another reason may be related to the reproducibility and portability, i.e. to have a universal preprocessing pipeline.

like image 160
TQCH Avatar answered Oct 20 '25 11:10

TQCH