I am testing out the pretrained inception v3 model on Pytorch. I fed it an image size 256x256 and also resized it up to 299x299. In both cases, the image was classified correctly.
Can someone explain why the PyTorch pretrained model can accept an image that's not 299x299?
It's because the pytorch implementation of inception v3 uses an adaptive average pooling layer right before the fully-connected layer.
If you take a look at the Inception3
class in torchvision/models/inception.py
, the operation of most interest with respect to your question is x = F.adaptive_avg_pool2d(x, (1, 1))
. Since the average pooling is adaptive the height and width of x
before pooling are independent of the output shape. In other words, after this operation we always get a tensor of size [b,c,1,1]
where b
and c
are the batch size and number of channels respectively. This way the input to the fully connected layer is always the same size so no exceptions are raised.
That said, if you're using the pretrained inception v3 weights then the model was originally trained for input of size 299x299. Using inputs of different sizes may have a negative impact on loss/accuracy, although smaller input images will almost certainly decrease computational time and memory footprint since the feature maps will be smaller.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With