I am trying to get some input on MATLAB's conv2
function. Suppose that we have an image I
of dimensions 5 x 5
and a kernel K
that is 3 x 3
.
conv2(I,K)
would return a 7 x 7
matrix. What extra operations are being done that I am not aware of? I can totally understand how conv2(I,K,'valid')
and conv2(I,K,'same')
works from a mathematical point of view. However, the default operation returns a larger matrix. Does anyone know what it actually does?
If you know how the 'valid
' flag and 'same'
flag work, then it's not that far of a stretch to go to what the default option is, which is the 'full'
option. As you slide the kernel across the image / matrix, as soon as at least one element from the kernel touches any element from the image / matrix, that is considered valid output. The output of the operation is dictated by the centre of where the kernel is when there is a valid output. For example, take a look at the following 5 x 5 image I
below with an example 3 x 3 kernel K
:
I = [1 2 3 4 5 ] K = [1 0 1]
[6 7 8 9 10] [1 0 1]
[11 12 13 14 15] [1 0 1]
[16 17 18 19 20]
[21 22 23 24 25]
Note that the numbers aren't that important but they're used for illustration. Also note that the kernel is symmetric and so performing a 180 degree rotation results in the same kernel. This is required of convolution before we start. In the 'full'
configuration, we slide the kernel from the top left to bottom right in a left to right, up to down fashion. The output of the first element in the output matrix happens when the bottom right of the kernel touches the top left of the image / matrix:
[1 0 1]
[1 `0` 1]
[1 0 [1*1] 2 3 4 5]
[6 7 8 9 10]
[11 12 13 14 15]
[16 17 18 19 20]
[21 22 23 24 25]
Note that the centre of the kernel as we sweep across the image is the location where we need to output in the image, denoted by the `` symbols. Remember that to compute convolution here, we find the weighted and element-wise sum of products between each element in the kernel and where it touches in the matrix / image.
Note that for any elements of the kernel that are out of bounds we ignore and so the output is simply where the bottom right of the kernel and the top left of the image touch and we multiply these elements together. The output is simply 1*1 = 1
. Now let's move over to the next element, which is 1 to the right:
[ 1 0 1]
[ 1 `0` 1]
[1 [0*1] [2*1] 3 4 5 ]
[6 7 8 9 10]
[11 12 13 14 15]
[16 17 18 19 20]
[21 22 23 24 25]
Note where the centre is as well as what elements the kernel touches the matrix. The output is thus 0*1 + 2*1 = 2
. You would continue this until you hit the end of this row where the bottom left of the kernel touches the top right of the image. You would then move down to the next row, repeat the sweep over all of the columns and continue up until the very end until the top left of the kernel touches the bottom right of the image / matrix.
Here are a couple more examples just to be sure you have the theory right. Let's do the point where the kernel touches the top right of the image / matrix
[ 1 0 1]
[ 1 `0` 1]
[1 2 3 4 [5*1]] 0 1]
[6 7 8 9 10]
[11 12 13 14 15]
[16 17 18 19 20]
[21 22 23 24 25]
Remember that we ignore all places where the kernel doesn't touch the image / matrix. The output in this case would simply be 5 and also note where the output location is. Here's another example:
[1 2 3 4 5 ]
[6 7 8 9 10]
[11 12 13 14 15]
[16 17 18 19 20]
[1 0 [[21*1] 22 23 24 25]
[1 `0` 1]
[1 0 1]
This location is at the bottom left corner of the image / matrix and the output here would simply be 21*1
. Another one just to be sure:
[1 2 3 4 5]
[6 7 8 9 10]
[11 12 13 14 [1*15]] 0 1]
[16 17 18 19 [1*20]] `0` 1]
[21 22 23 24 [1*25]] 0 1]
This location is a bit more complicated. The kernel overlaps the image / matrix fully by its first column and so the output is simply 1*15 + 1*20 + 1*25 = 60
. Also note that the output position is at the third-last row because there are still two more rows of filtering to perform. One where the first two rows of the kernel are touching the bottom last two rows of the image / matrix and one where the first row of the kernel touches the bottom last row of the image / matrix.
Therefore, the final output matrix would look something like this.
[1 2 * * * * 5 ]
[* * * * * * * ]
[* * * * * * * ]
[* * * * * * * ]
[* * * * * * 60]
[* * * * * * *]
[21 * * * * * *]
The elements marked as *
are unknown as I haven't calculated those, but the point is to notice what the final size of the matrix is. Specifically, note where the output positions are of where we need to write to the matrix for the first couple of cases that you see above. This is the reason why you get a larger matrix - to accommodate for the results when the kernel is not fully contained within the image / matrix itself but still performs valid operations. As you can see, you would need two additional rows: 1 for the top and 1 for the bottom, and two additional columns: 1 for the left and 1 for the right. This results in a (5 + 2) x (5 + 2) = 7 x 7
output matrix. In general if the kernel size is odd, the output you get from use 'full'
2D convolution is usually (rows + 2*floor(kernel_rows/2)) x (cols + 2*floor(kernel_cols/2))
where rows
and cols
are the rows and columns of the image / matrix to filter and kernel_rows
and kernel_cols
are the rows and columns of the kernel.
If you want to have a look at what MATLAB actually produces, we can. Using the input image / matrix and kernel I defined earlier, this is what we get:
>> I = reshape(1:25,5,5).'; %'
>> K = [1 0 1; 1 0 1; 1 0 1];
>> out = conv2(I,K)
out =
`1` `2` 4 6 8 4 `5`
7 9 18 22 26 13 15
18 21 42 48 54 27 30
33 36 72 78 84 42 45
48 51 102 108 114 57 `60`
37 39 78 82 86 43 45
`21` 22 44 46 48 24 25
Note that I've marked the sample calculations that we did in MATLAB's output with `` characters. This does agree with the calculations.
Now your true question is how 'valid'
and 'same'
factor into all this. Where 'valid'
and 'same'
come in are simply truncated versions of the 'full'
convolution. 'same'
gives you an output that is the same size as the image / matrix to be filtered and 'valid'
gives you an output such that you only provide outputs where the kernel was fully contained inside the image / matrix. Any time the kernel is out of bounds with respect to the image / matrix, we do not include those outputs as part of the final output. Simply put, 'valid'
and 'same'
use the 'full'
result but remove certain portions of the borders of the result to facilitate the option that you choose.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With