Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why PyTorch model takes multiple image size inside the model?

I am using a simple object detection model in PyTorch and using a Pytoch Model for Inferencing.

When I am using a simple iterator over the code

for k, image_path in enumerate(image_list):
    image = imgproc.loadImage(image_path)
    print(image.shape)
    with torch.no_grad():
        y, feature = net(x)        
    result = image.cuda()

It prints our variable sized images such as

torch.Size([1, 3, 384, 320])

torch.Size([1, 3, 704, 1024])

torch.Size([1, 3, 1280, 1280])

So When I am using Batch Inferencing using a DataLoader applying the same transformation the code is not running. However, when I am resizing all the images as 600.600 the batch processing runs successfully.

I am having Two Doubts,

First why Pytorch is capable of inputting dynamically sized inputs in Deep Learning Model and Why dynamic sized input is failing in Batch Processing.

like image 982
Abhik Sarkar Avatar asked Jul 03 '20 16:07

Abhik Sarkar


People also ask

How does PyTorch model inference?

Run model inference via Pandas UDFCreate a custom PyTorch dataset class. Define the function for model inference. Define the function for model inference. Run the model inference and save the result to a Parquet file.

How does PyTorch represent image?

In PyTorch, images are represented as [channels, height, width] , so a color image would be [3, 256, 256] . During the training you will get batches of images, so your shape in the forward method will get an additional batch dimension at dim0: [batch_size, channels, height, width] .

Is PyTorch a model?

Introduction to PyTorch Model. Python class represents the model where it is taken from the module with atleast two parameters defined in the program which we call as PyTorch Model. One model will have other models or attributes of other models in the same network which represents other parameters as well.


1 Answers

PyTorch has what is called a Dynamic Computational Graph (other explanation).

It allows the graph of the neural network to dynamically adapt to its input size, from one input to the next, during training or inference. This is what you observe in your first example: providing an image as a Tensor of size [1, 3, 384, 320] to your model, then another one as a Tensor of size [1, 3, 384, 1024], and so forth, is completely fine, as, for each input, your model will dynamically adapt.

However, if your input is a actually a collection of inputs (a batch), it is another story. A batch, for PyTorch, will be transformed to a single Tensor input with one extra dimension. For example, if you provide a list of n images, each of the size [1, 3, 384, 320], PyTorch will stack them, so that your model has a single Tensor input, of the shape [n, 1, 3, 384, 320].

This "stacking" can only happen between images of the same shape. To provide a more "intuitive" explanation than previous answers, this stacking operation cannot be done between images of different shapes, because the network cannot "guess" how the different images should "align" with one another in a batch, if they are not all the same size.

No matter if it happens during training or testing, if you create a batch out of images of varying size, PyTorch will refuse your input.

Several solutions are usually in use: reshaping as you did, adding padding (often small or null values on the border of your images) to extend your smaller images to the size of the biggest one, and so forth.

like image 65
clef Avatar answered Sep 27 '22 15:09

clef