Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

torch transform.resize() vs cv2.resize()

Tags:

python

pytorch

The CNN model takes an image tensor of size (112x112) as input and gives (1x512) size tensor as output.

Using Opencv function cv2.resize() or using Transform.resize in pytorch to resize the input to (112x112) gives different outputs.

What's the reason for this? (I understand that the difference in the underlying implementation of opencv resizing vs torch resizing might be a cause for this, But I'd like to have a detailed understanding of it)

import cv2
import numpy as np 
from PIL import image
import torch
import torchvision
from torchvision import transforms as trans


# device for pytorch
device = torch.device('cuda:0')

torch.set_default_tensor_type('torch.cuda.FloatTensor')

model = torch.jit.load("traced_facelearner_model_new.pt")
model.eval()

# read the example image used for tracing
image=cv2.imread("videos/example.jpg")

test_transform = trans.Compose([
            trans.ToTensor(),
            trans.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
        ])   
test_transform2 = trans.Compose([
            trans.Resize([int(112), int(112)]),
            trans.ToTensor(),
            trans.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
        ])      

resized_image = cv2.resize(image, (112, 112))

tensor1 = test_transform(resized_image).to(device).unsqueeze(0)
tensor2 = test_transform2(Image.fromarray(image)).to(device).unsqueeze(0)
output1 = model(tensor1)
output2 = model(tensor2)

The output1 and output2 tensors have different values.

like image 467
Arki99 Avatar asked Sep 12 '25 04:09

Arki99


1 Answers

Basically torchvision.transforms.Resize() uses PIL.Image.BILINEAR interpolation by default.

While in your code you simply use cv2.resize which doesn't use any interpolation.

For example

import cv2
from PIL import Image
import numpy as np

a = cv2.imread('videos/example.jpg')
b = cv2.resize(a, (112, 112))
c = np.array(Image.fromarray(a).resize((112, 112), Image.BILINEAR))

You will see that b and c are slightly different.

Edit:

Actually the opencv docs says

INTER_LINEAR - a bilinear interpolation (used by default)

But yeah, it doesn't give the same result as PIL.

Edit 2:

This also in the docs

To shrink an image, it will generally look best with INTER_AREA interpolation

And apparently

d = cv2.resize(a, (112, 112), interpolation=cv2.INTER_AREA)

Gives almost the same result as c. But these don't answer the question unfortunately.

like image 60
Natthaphon Hongcharoen Avatar answered Sep 14 '25 17:09

Natthaphon Hongcharoen