I've gone through the official doc. I'm having a hard time understanding what this function is used for and how it works. Can someone explain this in Layman terms?
I get an error for the example they provide, although the Pytorch version I'm using matches the documentation. Perhaps fixing the error, which I did, is supposed to teach me something? The snippet given in the documentation is:
fold = nn.Fold(output_size=(4, 5), kernel_size=(2, 2)) input = torch.randn(1, 3 * 2 * 2, 1) output = fold(input) output.size()
and the fixed snippet is:
fold = nn.Fold(output_size=(4, 5), kernel_size=(2, 2)) input = torch.randn(1, 3 * 2 * 2, 3 * 2 * 2) output = fold(input) output.size()
Thanks!
unfold imagines a tensor as a longer tensor with repeated columns/rows of values 'folded' on top of each other, which is then "unfolded": size determines how large the folds are. step determines how often it is folded.
Unfolding. Also called matrization, unfolding a tensor is done by reading the element in a given way as to obtain a matrix instead of a tensor. For a tensor of size ( I 0 , I 1 , ⋯ , I N ) , the n-mode unfolding of this tensor will be of size ( I n , I 0 , I 1 × ⋯ × I n − 1 × I n + 1 ⋯ × I N ) .
unfold
and fold
are used to facilitate "sliding window" operation (like convolutions).
Suppose you want to apply a function foo
to every 5x5 window in a feature map/image:
from torch.nn import functional as f windows = f.unfold(x, kernel_size=5)
Now windows
has size
of batch-(5*5*x.size(1)
)-num_windows, you can apply foo
on windows
:
processed = foo(windows)
Now you need to "fold" processed
back to the original size of x
:
out = f.fold(processed, x.shape[-2:], kernel_size=5)
You need to take care of padding
, and kernel_size
that may affect your ability to "fold" back processed
to the size of x
.
Moreover, fold
sums over overlapping elements, so you might want to divide the output of fold
by patch size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With