I am experimenting with dilation in convolution where I am trying to copy data from one 2D tensor to another 2D tensor using PyTorch. I'm copying values from tensor A
to tensor B
such that every element of A
that is copied into B
is surrounded by n
zeros.
I have already tried using nested for
loops, which is a very naive way. The performance, obviously is quite bad when I'm using large number of grayscale images as input.
for i in range(A.shape[0]):
for j in range(A.shape[1]):
B[n+i][n+j] = A[i][j]
Is there anything faster that doesn't need the usage of loops?
If I understand your question correctly, here is a faster alternative, without any loops:
# sample `n`
In [108]: n = 2
# sample tensor to work with
In [102]: A = torch.arange(start=1, end=5*4 + 1).view(5, -1)
In [103]: A
Out[103]:
tensor([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20]])
# our target tensor where we will copy values
# we need to multiply `n` by 2 since there are two axes
In [104]: B = torch.zeros(A.shape[0] + 2*n, A.shape[1] + 2*n)
# copy the values, at the center of the grid
# leaving `n` positions on the surrounding
In [106]: B[n:-n, n:-n] = A
# check whether we did it correctly
In [107]: B
Out[107]:
tensor([[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 1., 2., 3., 4., 0., 0.],
[ 0., 0., 5., 6., 7., 8., 0., 0.],
[ 0., 0., 9., 10., 11., 12., 0., 0.],
[ 0., 0., 13., 14., 15., 16., 0., 0.],
[ 0., 0., 17., 18., 19., 20., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.]])
Another case where n=3
In [118]: n = 3
# we need to multiply `n` by 2 since there are two axes
In [119]: B = torch.zeros(A.shape[0] + 2*n, A.shape[1] + 2*n)
# copy the values, at the center of the grid
# leaving `n` positions on the surrounding
In [120]: B[n:-n, n:-n] = A
In [121]: B
Out[121]:
tensor([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 2., 3., 4., 0., 0., 0.],
[ 0., 0., 0., 5., 6., 7., 8., 0., 0., 0.],
[ 0., 0., 0., 9., 10., 11., 12., 0., 0., 0.],
[ 0., 0., 0., 13., 14., 15., 16., 0., 0., 0.],
[ 0., 0., 0., 17., 18., 19., 20., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
sanity check with your loop
based solution:
In [122]: n = 2
In [123]: B = torch.zeros(A.shape[0] + 2*n, A.shape[1] + 2*n)
In [124]: for i in range(A.shape[0]):
...: for j in range(A.shape[1]):
...: B[n+i][n+j] = A[i][j]
...:
In [125]: B
Out[125]:
tensor([[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 1., 2., 3., 4., 0., 0.],
[ 0., 0., 5., 6., 7., 8., 0., 0.],
[ 0., 0., 9., 10., 11., 12., 0., 0.],
[ 0., 0., 13., 14., 15., 16., 0., 0.],
[ 0., 0., 17., 18., 19., 20., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.]])
timings:
# large sized input tensor
In [126]: A = torch.arange(start=1, end=5000*4 + 1).view(5000, -1)
In [127]: n = 2
In [132]: B = torch.zeros(A.shape[0] + 2*n, A.shape[1] + 2*n)
# loopy solution
In [133]: %%timeit
...: for i in range(A.shape[0]):
...: for j in range(A.shape[1]):
...: B[n+i][n+j] = A[i][j]
...:
92.1 ms ± 434 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# clear out `B` again by reinitializing it.
In [128]: B = torch.zeros(A.shape[0] + 2*n, A.shape[1] + 2*n)
In [129]: %timeit B[n:-n, n:-n] = A
49.6 µs ± 239 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
From the above timings, we can see that the vectorized approach is ~200x
faster than loop based solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With