How to calculate output sizes after a convolution layer in a configuration file?

Question

I'm new to convolutional neural networks and wanted to know how to calculate or figure out the output sizes between layers of a model given a configuration file for pytorch similar to those following instructions in this link.

Most of the stuff I've already looked at hasn't been very clear and concise. How am I supposed to calculate the sizes through each layer? Below is a snippet of a configuration file that would be parsed.

# (3, 640, 640)
[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

# (16, 320, 320)

trsvchn · Accepted Answer

In short, there is a common formula for output dims calculation:

formula

You can find explanation in A guide to receptive field arithmetic for Convolutional Neural Networks.

In addition, I'd like to recommend amazing article A guide to convolution arithmetic for deep learning.

And this repo conv_arithmetic with convolution animations.

Jeff Hykin · Answer

Doing the math by hand is error prone (at least for myself)

The most reliable way I've found:

import torch
from torch import nn

import functools
import operator

def shape_of_output(shape_of_input, list_of_layers):
    sequential = nn.Sequential(*list_of_layers)
    return tuple(sequential(torch.rand(1, *shape_of_input)).shape)

def size_of_output(shape_of_input, list_of_layers):
    return functools.reduce(operator.mul, list(shape_of_output(shape_of_input, list_of_layers)))

It simply runs the input through the layers once, and then prints the size of the output. So it is a tiny bit wasteful, but is essentially guaranteed to be correct even as new features/options are added to pytorch.

Example (runs if copy+pasted)

# 
# example setup
# 
import random
out_channel_of_first = random.randint(1,16)
kernel_size_of_first = random.choice([3,5,7,11])
grayscale_image_shape = (1, 48, 48)
color_image_shape     = (3, 48, 48) # alternative example

# 
# example usage
# 
print('the output shape will be', shape_of_output(
    shape_of_input=grayscale_image_shape,
    list_of_layers=[         
        nn.Conv2d(
            in_channels=grayscale_image_shape[0],
            out_channels=out_channel_of_first,
            kernel_size=kernel_size_of_first,
        ),
        nn.ReLU(),
        nn.MaxPool2d(2,2),
        
        # next major layer
        nn.Conv2d(
            in_channels=out_channel_of_first,
            out_channels=5,
            kernel_size=3
        ),
        nn.ReLU(),
        nn.MaxPool2d(2,2),
    ],
))

How to calculate output sizes after a convolution layer in a configuration file?

Tags:

pytorch

object-detection

jg925

2 Answers

trsvchn

The most reliable way I've found:

Example (runs if copy+pasted)

Jeff Hykin

Recent Activity

Donate For Us

How to calculate output sizes after a convolution layer in a configuration file?

Tags:

pytorch

object-detection

jg925

2 Answers

trsvchn

The most reliable way I've found:

Example (runs if copy+pasted)

Jeff Hykin

Related questions

Recent Activity

Donate For Us