I am still struggling with the "shift and stitch" trick in FCN after repeating reading it many times.
Can someone give some intuitional explanation?
Fully Convolutional Networks, or FCNs, are an architecture used mainly for semantic segmentation. They employ solely locally connected layers, such as convolution, pooling and upsampling. Avoiding the use of dense layers means less parameters (making the networks faster to train).
In short – shift stitch is an extra stitch(s) you work at the end of a round to make your work look symmetrical (if you thread marker moves right), or unravel a stitch(s) at the end of a round (if you thread marker moves left).
A fully convolution network (FCN) is a neural network that only performs convolution (and subsampling or upsampling) operations. Equivalently, an FCN is a CNN without fully connected layers.
FCN is a network that does not contain any “Dense” layers (as in traditional CNNs) instead it contains 1x1 convolutions that perform the task of fully connected layers (Dense layers).
While this question has been answered, I found this image here that better-explained shift-and-stitch. Just image your FCN is a 2x2 max-pooling layer (Also the numbers represent pixel values not index values btw). So the values are being max-pulled after doing the shifting and then we stitch the results into the original image:

In FCN, the final output you get (by default without utilizing any tricks for upsampling) is at a lower resolution compared to the input. Assuming you have an input image of shape 100x100 and you get an output (from the network) of shape 10x10. Mapping the output directly to the input resolution will look patchy (even with high order interpolation).
Now, you take the same input and shift it a bit and get the output and repeat this process multiple times. You end up with a set of output images and a vector of shifts corresponding to each output. These output images with the shift vectors can be utilized (stitch) to get better resolution in the final schematic map.
One might think of it as taking multiple (shifted) low-resolution images of an object and combining (stitch) them to get a higher resolution image.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With