I am reading this document on tf.space_to_depth
. There, it says about the use of the function:
This operation is useful for resizing the activations between convolutions (but keeping all data), e.g. instead of pooling. It is also useful for training purely convolutional models.
However, I still don't get a clear understanding of this. Why is it sometimes necessary to resize the activations in a model?
space_to_depth
is a convolutional practice used very often for lossless spatial dimensionality reduction. Applied to tensor (example_dim, width, height, channels)
with block_size = k
it produces a tensor with shape (example_dim, width / block_size, height / block_size, channels * block_size ** 2)
. It works in a following manner (example_dim
is skipped for simplicity):
Cut image / feature map into chunks of size (block_size, block_size, channels): e.g. the following image (with block_size = 2
):
[[[1], [2], [3], [4]],
[[5], [6], [7], [8]],
[[9], [10], [11], [12]],
[[13], [14], [15], [16]]]
is divided into the following chunks:
[[[1], [2]], [[[3], [4]],
[[5], [6]]] [[7], [8]]]
[[[9], [10],] [[[11], [12]],
[[13], [14]]] [[15], [16]]]
Flatten each chunk to a single array:
[[1, 2, 5, 6]], [[3, 4, 7, 8]]
[[9 10, 13, 14]], [[11, 12, 15, 16]]
Spatially rearrange chunks according to their initial position:
[[[1, 2, 5, 6]], [[3, 4, 7, 8]],
[[9 10, 13, 14]], [[11, 12, 15, 16]]]
So - as you may see - the initial image with size (4, 4, 1)
was rearranged to feature map with shape (2, 2, 4)
. The following strategy is usually used for applications like object detection, segmentation or superresolution when it's important to decrease the spatial size of an image without losing reduction (like pooling
). An example of an application of this technique might be found e.g. here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With