Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why we use Unsqueeze() function while image processing?

I was trying to work on a guided project and it was related to image processing. While working on the image processing the instructor used Unsqueeze(0) function for setting up the bed size. I would like to know what happens after changing the bed size. The code is given below for your reference.

I will be very thankfull for a quick response.

from torchvision import transforms as T

def preprocess(img_path,max_size = 500):
  image = Image.open(img_path).convert('RGB')

  if max(image.size) > max_size:
    size = max_size
  else:
    size = max(image.size)

  img_transform = T.Compose([
                             T.Resize(size),
                             T.ToTensor(),
                             T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
  ])

  image = img_transform(image)
  image = image.unsqueeze(0)
  return image
like image 656
Rachit S Garg Avatar asked Sep 21 '25 07:09

Rachit S Garg


1 Answers

The unsqueeze is used here likely because you are working with a convolutional neural network.

When you load an image, it will typically have 3 dimensions, Width, Height, and Number of Color Channels. For black and white images, the number of color channels is 1, for colored images, there are 3 color channels (red, green, and blue, RGB). So, in your case, when you load the image and store it as a tensor, it has shape:

image = img_transform(image) # the resulting image has shape [3, H, W]

Note, the reason that the order of dimensions is [channels, height, width] and not some other order is because of PyTorch. Other libraries/software may do it differently.

However, 3 dimensions is not enough for a 2D Convolutional Neural Network. In deep learning, data is processed in batches. So, in the case of a convolutional neural network, instead of processing just one image at a time it will process N images at the same time in parallel. We call this collection of images a batch. So instead of dimensions [C, H, W], you'll have [N, C, H, W] (as seen here). For example, a batch of 64 colored 100 by 100 images, you would have the shape:

[64, 3, 100, 100]

Now, if you want to only process one image at a time, you still need to put it into batch form for a model to accept it. For example, if you have an image of shape [3, 100, 100] you'd need to convert it to [1, 3, 100, 100]. This is what unsqueeze(0) does:

image = img_transform(image) # [3, H, W]
image = image.unsqueeze(0) # [1, 3, H, W]
like image 192
Jay Mody Avatar answered Sep 22 '25 21:09

Jay Mody