In order to feed an image to the pytorch network I first need to downscale it to some fixed size. At first I've done it using PIL.Image.resize() method, with interpolation mode set to BILINEAR. Then I though it would be more convenient to first convert a batch of images to pytorch tensor and then use torch.nn.functional.interpolate() function to scale the whole tensor at once on a GPU ('bilinear' interpolation mode as well). This lead to a decrease of the model accuracy because now during inference a type of scaling (torch) was different from the one used during training (PIL). After that, I compared two methods of downscaling visually and found out that they produce different results. Pillow downscaling seems more smooth. Do these methods perform different operations under the hood though both being bilinear? If so, I am also curious if there is a way to achieve the same result as Pillow image scaling with torch tensor scaling?
Original image (the well-known Lenna image)
Pillow scaled image:
Torch scaled image:
Mean channel absolute difference map:
Demo code:
import numpy as np
from PIL import Image
import torch
import torch.nn.functional as F
from torchvision import transforms
import matplotlib.pyplot as plt
pil_to_torch = transforms.ToTensor()
res_shape = (128, 128)
pil_img = Image.open('Lenna.png')
torch_img = pil_to_torch(pil_img)
pil_image_scaled = pil_img.resize(res_shape, Image.BILINEAR)
torch_img_scaled = F.interpolate(torch_img.unsqueeze(0), res_shape, mode='bilinear').squeeze(0)
pil_image_scaled_on_torch = pil_to_torch(pil_image_scaled)
relative_diff = torch.abs((pil_image_scaled_on_torch - torch_img_scaled) / pil_image_scaled_on_torch).mean().item()
print('relative pixel diff:', relative_diff)
pil_image_scaled_numpy = pil_image_scaled_on_torch.cpu().numpy().transpose([1, 2, 0])
torch_img_scaled_numpy = torch_img_scaled.cpu().numpy().transpose([1, 2, 0])
plt.imsave('pil_scaled.png', pil_image_scaled_numpy)
plt.imsave('torch_scaled.png', torch_img_scaled_numpy)
plt.imsave('mean_diff.png', np.abs(pil_image_scaled_numpy - torch_img_scaled_numpy).mean(-1))
Python 3.6.6, requirements:
cycler==0.10.0
kiwisolver==1.1.0
matplotlib==3.2.1
numpy==1.18.2
Pillow==7.0.0
pyparsing==2.4.6
python-dateutil==2.8.1
six==1.14.0
torch==1.4.0
torchvision==0.5.0
Python Imaging Library is a free and open-source additional library for the Python programming language that adds support for opening, manipulating, and saving many different image file formats. It is available for Windows, Mac OS X and Linux. The latest version of PIL is 1.1.
"Bilinear interpolation" is an interpolation method.
But downscaling an image is not necessarily only accomplished using interpolation.
It is possible to simply resample the image as a lower sampling rate, using an interpolation method to compute new samples that don't coincide with old samples. But this leads to aliasing (which is what you get when higher frequency components in the image cannot be represented at the lower sampling density, "aliasing" the energy of these higher frequencies onto lower frequency components; that is, new low frequency components appear in the image after the resampling).
To avoid aliasing, some libraries apply a low-pass filter (remove high frequencies that cannot be represented at the lower sampling frequency) before resampling. The subsampling algorithm in these libraries do much more than just interpolating.
The difference you see is because these two libraries take different approaches, one tries to avoid aliasing by low-pass filtering, the other doesn't.
To obtain the same results in Torch as in Pillow, you need to explicitly low-pass filter the image yourself. To get identical results you will have to figure out exactly how Pillow filters the image, there are different methods and different possible parameter settings. Looking at the source code is the best way to find out exactly what they do.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With