In the definition of tf.nn.max_pool
, what is ksize
used for?
tf.nn.max_pool(value, ksize, strides, padding, data_format='NHWC', name=None) Performs the max pooling on the input. Args: value: A 4-D Tensor with shape [batch, height, width, channels] and type tf.float32. ksize: A list of ints that has length >= 4. The size of the window for each dimension of the input tensor.
For instance, if an input value
is of tensor : [1, 64, 64, 3]
and ksize=3
.what does that mean?
The documentation states:
ksize: A list of ints that has length >= 4. The size of the window for each dimension of the input tensor.
In general for images, your input is of shape [batch_size, 64, 64, 3]
for an RGB image of 64x64 pixels.
The kernel size ksize
will typically be [1, 2, 2, 1]
if you have a 2x2 window over which you take the maximum. On the batch size dimension and the channels dimension, ksize
is 1
because we don't want to take the maximum over multiple examples, or over multiples channels.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With