I loaded a custom PyTorch model and I want to find out its input shape. Something like this:
model.input_shape
Is it possible to get this information?
Update: print()
and summary()
don't show this model's input shape, so they are not what I'm looking for.
PyTorch models are very flexible objects, to the point where they do not enforce or generally expect a fixed input shape for data.
If you have certain layers there may be constraints e.g:
But as you can see neither of these enforce the total shape of the data.
We might not realize it right now, but in more complex models, getting the size of the first linear layer right is sometimes a source of frustration. We’ve heard stories of famous practitioners putting in arbitrary numbers and then relying on error messages from PyTorch to backtrack the correct sizes for their linear layers. Lame, eh? Nah, it’s all legit!
If your model's first layer is a fully connected one, then the first layer in print(model)
will detail the expected dimensionality of a single sample.
If it is a convolutional layer however, since these are dynamic and will stride as long/wide as the input permits, there is no simple way to retrieve this info from the model itself.1 This flexibility means that for many architectures multiple compatible input sizes2 will all be acceptable by the network.
This is a feature of PyTorch's Dynamic computational graph.
What you will need to do is investigate the network architecture, and once you've found an interpretable layer (if one is present e.g. fully connected) "work backwards" with its dimensions, determining how the previous layers (e.g. poolings and convolutions) have compressed/modified it.
e.g. in the following model from Deep Learning with PyTorch (8.5.1):
class NetWidth(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(32, 16, kernel_size=3, padding=1)
self.fc1 = nn.Linear(16 * 8 * 8, 32)
self.fc2 = nn.Linear(32, 2)
def forward(self, x):
out = F.max_pool2d(torch.tanh(self.conv1(x)), 2)
out = F.max_pool2d(torch.tanh(self.conv2(out)), 2)
out = out.view(-1, 16 * 8 * 8)
out = torch.tanh(self.fc1(out))
out = self.fc2(out)
return out
We see the model takes an input 2.d. image with 3
channels and:
Conv2d
-> sends it to an image of the same size with 32 channelsmax_pool2d(,2)
-> halves the size of the image in each dimensionConv2d
-> sends it to an image of the same size with 16 channelsmax_pool2d(,2)
-> halves the size of the image in each dimensionview
-> reshapes the imageLinear
-> takes a tensor of size 16 * 8 * 8
and sends to size 32
So working backwards, we have:
16 * 8 * 8
view
was of shape (channels, 8,8), and currently is (channels, 16,16)2
So assuming the kernel_size and padding are sufficient that the convolutions themselves maintain image dimensions, it is likely that the input image is of shape (3,32,32) i.e. RGB 32x32 pixel square images.
Notes:
Even the external package pytorch-summary
requires you provide the input shape in order to display the shape of the output of each layer.
It could however be any 2 numbers whose produce equals 8*8 e.g. (64,1), (32,2), (16,4) etc however since the code is written as 8*8 it is likely the authors used the actual dimensions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With