Why could u-net mask image with smaller mask?

Question

The input image size of u-net is 572*572, but the output mask size is 388*388. How could the image get masked with a smaller mask?

Honeybear · Accepted Answer

Probably you are referring to the scientific paper by Ronneberger et al in which the U-Net architecture was published. There the graph shows these numbers.

U-Net architecture

The explanation is a bit hidden in section "3. Training" of the paper:

Due to the unpadded convolutions, the output image is smaller than the input by a constant border width.

This means that during each convolution, part of the image is "cropped" since the convolution will start in a coordinate so that it fully overlaps with the input-image / input-blob of the layer. In case of 3x3 convolutions, this is always one pixel at each side. For more a visual explanation of kernels/convolutions see e.g. here. The output is smaller because due to the cropping occuring during unpadded convolutions only (the inner) part of the image gets a result.

It is not a general characteristic of the architecture, but something inherent to (unpadded) convolutions and can be avoided with padding. Probably the most common strategy is mirroring at the image borders, so that each convolution can start at the very edge of an image (and sees mirrored pixels in places where it's kernel overlaps). Then the input size can be preserved and the full image will be segmented.

Why could u-net mask image with smaller mask?

Tags:

neural-network

deep-learning

conv-neural-network

semantic-segmentation

unet-neural-network

Ink

1 Answers

Honeybear

Recent Activity

Donate For Us

Why could u-net mask image with smaller mask?

Tags:

neural-network

deep-learning

conv-neural-network

semantic-segmentation

unet-neural-network

Ink

1 Answers

Honeybear

Related questions

Recent Activity

Donate For Us