I was trying to work on a guided project and it was related to image processing. While working on the image processing the instructor used Unsqueeze(0) function for setting up the bed size. I would like to know what happens after changing the bed size. The code is given below for your reference.
I will be very thankfull for a quick response.
from torchvision import transforms as T
def preprocess(img_path,max_size = 500):
image = Image.open(img_path).convert('RGB')
if max(image.size) > max_size:
size = max_size
else:
size = max(image.size)
img_transform = T.Compose([
T.Resize(size),
T.ToTensor(),
T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
])
image = img_transform(image)
image = image.unsqueeze(0)
return image
The unsqueeze
is used here likely because you are working with a convolutional neural network.
When you load an image, it will typically have 3 dimensions, Width, Height, and Number of Color Channels. For black and white images, the number of color channels is 1, for colored images, there are 3 color channels (red, green, and blue, RGB). So, in your case, when you load the image and store it as a tensor, it has shape:
image = img_transform(image) # the resulting image has shape [3, H, W]
Note, the reason that the order of dimensions is [channels, height, width]
and not some other order is because of PyTorch
. Other libraries/software may do it differently.
However, 3 dimensions is not enough for a 2D Convolutional Neural Network. In deep learning, data is processed in batches. So, in the case of a convolutional neural network, instead of processing just one image at a time it will process N
images at the same time in parallel. We call this collection of images a batch. So instead of dimensions [C, H, W]
, you'll have [N, C, H, W]
(as seen here). For example, a batch of 64 colored 100 by 100 images, you would have the shape:
[64, 3, 100, 100]
Now, if you want to only process one image at a time, you still need to put it into batch form for a model to accept it. For example, if you have an image of shape [3, 100, 100]
you'd need to convert it to [1, 3, 100, 100]
. This is what unsqueeze(0)
does:
image = img_transform(image) # [3, H, W]
image = image.unsqueeze(0) # [1, 3, H, W]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With