Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does shift-and-stitch in a fully convolutional network work?

I am still struggling with the "shift and stitch" trick in FCN after repeating reading it many times.

Can someone give some intuitional explanation?

like image 802
lhao0301 Avatar asked Nov 19 '16 08:11

lhao0301


People also ask

How does fully convolutional network work?

Fully Convolutional Networks, or FCNs, are an architecture used mainly for semantic segmentation. They employ solely locally connected layers, such as convolution, pooling and upsampling. Avoiding the use of dense layers means less parameters (making the networks faster to train).

What is shift and stitch?

In short – shift stitch is an extra stitch(s) you work at the end of a round to make your work look symmetrical (if you thread marker moves right), or unravel a stitch(s) at the end of a round (if you thread marker moves left).

What is the difference between FCN and CNN?

A fully convolution network (FCN) is a neural network that only performs convolution (and subsampling or upsampling) operations. Equivalently, an FCN is a CNN without fully connected layers.

What is a fully convolutional neural network FCN )? How can you turn a dense layer into a convolutional layer?

FCN is a network that does not contain any “Dense” layers (as in traditional CNNs) instead it contains 1x1 convolutions that perform the task of fully connected layers (Dense layers).


2 Answers

While this question has been answered, I found this image here that better-explained shift-and-stitch. Just image your FCN is a 2x2 max-pooling layer (Also the numbers represent pixel values not index values btw). So the values are being max-pulled after doing the shifting and then we stitch the results into the original image: Shift and Stich

like image 110
bpinaya Avatar answered Jun 26 '23 06:06

bpinaya


In FCN, the final output you get (by default without utilizing any tricks for upsampling) is at a lower resolution compared to the input. Assuming you have an input image of shape 100x100 and you get an output (from the network) of shape 10x10. Mapping the output directly to the input resolution will look patchy (even with high order interpolation).

Now, you take the same input and shift it a bit and get the output and repeat this process multiple times. You end up with a set of output images and a vector of shifts corresponding to each output. These output images with the shift vectors can be utilized (stitch) to get better resolution in the final schematic map.

One might think of it as taking multiple (shifted) low-resolution images of an object and combining (stitch) them to get a higher resolution image.

like image 41
MadRao Avatar answered Jun 26 '23 06:06

MadRao