I'm new to convolutional neural networks and wanted to know how to calculate or figure out the output sizes between layers of a model given a configuration file for pytorch similar to those following instructions in this link.
Most of the stuff I've already looked at hasn't been very clear and concise. How am I supposed to calculate the sizes through each layer? Below is a snippet of a configuration file that would be parsed.
# (3, 640, 640)
[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
# (16, 320, 320)
In short, there is a common formula for output dims calculation:
You can find explanation in A guide to receptive field arithmetic for Convolutional Neural Networks.
In addition, I'd like to recommend amazing article A guide to convolution arithmetic for deep learning.
And this repo conv_arithmetic with convolution animations.
Doing the math by hand is error prone (at least for myself)
import torch
from torch import nn
import functools
import operator
def shape_of_output(shape_of_input, list_of_layers):
sequential = nn.Sequential(*list_of_layers)
return tuple(sequential(torch.rand(1, *shape_of_input)).shape)
def size_of_output(shape_of_input, list_of_layers):
return functools.reduce(operator.mul, list(shape_of_output(shape_of_input, list_of_layers)))
It simply runs the input through the layers once, and then prints the size of the output. So it is a tiny bit wasteful, but is essentially guaranteed to be correct even as new features/options are added to pytorch.
#
# example setup
#
import random
out_channel_of_first = random.randint(1,16)
kernel_size_of_first = random.choice([3,5,7,11])
grayscale_image_shape = (1, 48, 48)
color_image_shape = (3, 48, 48) # alternative example
#
# example usage
#
print('the output shape will be', shape_of_output(
shape_of_input=grayscale_image_shape,
list_of_layers=[
nn.Conv2d(
in_channels=grayscale_image_shape[0],
out_channels=out_channel_of_first,
kernel_size=kernel_size_of_first,
),
nn.ReLU(),
nn.MaxPool2d(2,2),
# next major layer
nn.Conv2d(
in_channels=out_channel_of_first,
out_channels=5,
kernel_size=3
),
nn.ReLU(),
nn.MaxPool2d(2,2),
],
))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With