strides: A list of ints that has length >= 4. The stride of the sliding window for each dimension of the input tensor.
expand_dims() is used to insert an addition dimension in input Tensor. Parameters: input: It is the input Tensor. axis: It defines the index at which dimension should be inserted.
conv2d(): Compute a 2-D Convolution in TensorFlow – TensorFlow Tutorial. By admin | August 9, 2020. TensorFlow tf. nn. conv2d() function is widely used to build a convolution network in deep learning.
The pooling and convolutional ops slide a "window" across the input tensor. Using tf.nn.conv2d
as an example: If the input tensor has 4 dimensions: [batch, height, width, channels]
, then the convolution operates on a 2D window on the height, width
dimensions.
strides
determines how much the window shifts by in each of the dimensions. The typical use sets the first (the batch) and last (the depth) stride to 1.
Let's use a very concrete example: Running a 2-d convolution over a 32x32 greyscale input image. I say greyscale because then the input image has depth=1, which helps keep it simple. Let that image look like this:
00 01 02 03 04 ...
10 11 12 13 14 ...
20 21 22 23 24 ...
30 31 32 33 34 ...
...
Let's run a 2x2 convolution window over a single example (batch size = 1). We'll give the convolution an output channel depth of 8.
The input to the convolution has shape=[1, 32, 32, 1]
.
If you specify strides=[1,1,1,1]
with padding=SAME
, then the output of the filter will be [1, 32, 32, 8].
The filter will first create an output for:
F(00 01
10 11)
And then for:
F(01 02
11 12)
and so on. Then it will move to the second row, calculating:
F(10, 11
20, 21)
then
F(11, 12
21, 22)
If you specify a stride of [1, 2, 2, 1] it won't do overlapping windows. It will compute:
F(00, 01
10, 11)
and then
F(02, 03
12, 13)
The stride operates similarly for the pooling operators.
Question 2: Why strides [1, x, y, 1] for convnets
The first 1 is the batch: You don't usually want to skip over examples in your batch, or you shouldn't have included them in the first place. :)
The last 1 is the depth of the convolution: You don't usually want to skip inputs, for the same reason.
The conv2d operator is more general, so you could create convolutions that slide the window along other dimensions, but that's not a typical use in convnets. The typical use is to use them spatially.
Why reshape to -1 -1 is a placeholder that says "adjust as necessary to match the size needed for the full tensor." It's a way of making the code be independent of the input batch size, so that you can change your pipeline and not have to adjust the batch size everywhere in the code.
The inputs are 4 dimensional and are of form: [batch_size, image_rows, image_cols, number_of_colors]
Strides, in general, define an overlap between applying operations. In the case of conv2d, it specifies what is the distance between consecutive applications of convolutional filters. The value of 1 in a specific dimension means that we apply the operator at every row/col, the value of 2 means every second, and so on.
Re 1) The values that matter for convolutions are 2nd and 3rd and they represent the overlap in the application of the convolutional filters along rows and columns. The value of [1, 2, 2, 1] says that we want to apply the filters on every second row and column.
Re 2) I don't know the technical limitations (might be CuDNN requirement) but typically people use strides along the rows or columns dimensions. It doesn't necessarily make sense to do it over batch size. Not sure of the last dimension.
Re 3) Setting -1 for one of the dimension means, "set the value for the first dimension so that the total number of elements in the tensor is unchanged". In our case, the -1 will be equal to the batch_size.
Let's assume your input = [1, 0, 2, 3, 0, 1, 1]
and kernel = [2, 1, 3]
the result of the convolution is [8, 11, 7, 9, 4]
, which is calculated by sliding your kernel over the input, performing element-wise multiplication and summing everything. Like this:
Here we slide by one element, but nothing stops you by using any other number. This number is your stride. You can think about it as downsampling the result of the 1-strided convolution by just taking every s-th result.
Knowing the input size i, kernel size k, stride s and padding p you can easily calculate the output size of the convolution as:
Here || operator means ceiling operation. For a pooling layer s = 1.
Knowing the math for a 1-dim case, n-dim case is easy once you see that each dim is independent. So you just slide each dimension separately. Here is an example for 2-d. Notice that you do not need to have the same stride at all the dimensions. So for an N-dim input/kernel you should provide N strides.
If one component of shape is the special value -1, the size of that dimension is computed so that the total size remains constant. In particular, a shape of [-1] flattens into 1-D. At most one component of shape can be -1.
@dga has done a wonderful job explaining and I can't be thankful enough how helpful it has been. In the like manner, I will like to share my findings on how stride
works in 3D convolution.
According to the TensorFlow documentation on conv3d, the shape of the input must be in this order:
[batch, in_depth, in_height, in_width, in_channels]
Let's explain the variables from the extreme right to the left using an example. Assuming the input shape is
input_shape = [1000,16,112,112,3]
input_shape[4] is the number of colour channels (RGB or whichever format it is extracted in)
input_shape[3] is the width of the image
input_shape[2] is the height of the image
input_shape[1] is the number of frames that have been lumped into 1 complete data
input_shape[0] is the number of lumped frames of images we have.
Below is a summary documentation for how stride is used.
strides: A list of ints that has length >= 5. 1-D tensor of length 5. The stride of the sliding window for each dimension of input. Must have
strides[0] = strides[4] = 1
As indicated in many works, strides simply mean how many steps away a window or kernel jumps away from the closest element, be it a data frame or pixel (this is paraphrased by the way).
From the above documentation, a stride in 3D will look like this strides = (1,X,Y,Z,1).
The documentation emphasizes that strides[0] = strides[4] = 1
.
strides[0]=1 means that we do not want to skip any data in the batch
strides[4]=1 means that we do not want to skip in the channel
strides[X] means how many skips we should make in the lumped frames. So for example, if we have 16 frames, X=1 means use every frame. X=2 means use every second frame and it goes and on
strides[y] and strides[z] follow the explanation by @dga so I will not redo that part.
In keras however, you only need to specify a tuple/list of 3 integers, specifying the strides of the convolution along each spatial dimension, where spatial dimension is stride[x], strides[y] and strides[z]. strides[0] and strides[4] is already defaulted to 1.
I hope someone finds this helpful!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With