Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is output tensor of Max Pooling 2D Layer in TensorFlow?

I was trying to understand some basics about the tensorflow and I got stuck while reading documentation for max pooling 2D layer: https://www.tensorflow.org/tutorials/layers#pooling_layer_1

This is how max_pooling2d is specified:

pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

where conv1 has a tensor with shape [batch_size, image_width, image_height, channels], concretely in this case it's [batch_size, 28, 28, 32].

So our input is a tensor with shape: [batch_size, 28, 28, 32].

My understanding of a max pooling 2D layer is that it will apply a filter of size pool_size (2x2 in this case) and moving sliding window by stride (also 2x2). This means that both width and height of the image will be halfed, i.e. we will end up with 14x14 pixels per channel (32 channels in total), meaning our output is a tensor with shape: [batch_size, 14, 14, 32].

However, according to the above link, the shape of the output tensor is [batch_size, 14, 14, 1]:

Our output tensor produced by max_pooling2d() (pool1) has a shape of 
[batch_size, 14, 14, 1]: the 2x2 filter reduces width and height by 50%.

What am I missing here?

How was 32 converted to 1?

They apply the same logic later here: https://www.tensorflow.org/tutorials/layers#convolutional_layer_2_and_pooling_layer_2

but this time it's correct, i.e. [batch_size, 14, 14, 64] becomes [batch_size, 7, 7, 64] (number of channels is the same).

like image 953
Nikola Stojiljkovic Avatar asked Apr 17 '17 14:04

Nikola Stojiljkovic


People also ask

What is 2D Max pooling?

Max pooling operation for 2D spatial data. Downsamples the input along its spatial dimensions (height and width) by taking the maximum value over an input window (of size defined by pool_size ) for each channel of the input. The window is shifted by strides along each dimension.

What is Max pooling TensorFlow?

Max Pooling is a pooling operation that calculates the maximum value for patches of a feature map, and uses it to create a downsampled (pooled) feature map. It is usually used after a convolutional layer.

What is pool size in max pooling?

Global max pooling = ordinary max pooling layer with pool size equals to the size of the input (minus filter size + 1, to be precise).

What is Max pooling layer in CNN?

Max pooling is a pooling operation that selects the maximum element from the region of the feature map covered by the filter. Thus, the output after max-pooling layer would be a feature map containing the most prominent features of the previous feature map.


2 Answers

Yes, use 2x2 max pool with strides=2x2 will reduce data to a half, and the output depth will not be changed. This is my test code from your given, the output shape is (14, 14, 32), maybe something wrong?

#!/usr/bin/env python

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('./MNIST_data/', one_hot=True)

conv1 = tf.placeholder(tf.float32, [None,28,28,32])
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2,2], strides=2)
print pool1.get_shape()

the output is:

Extracting ./MNIST_data/train-images-idx3-ubyte.gz
Extracting ./MNIST_data/train-labels-idx1-ubyte.gz
Extracting ./MNIST_data/t10k-images-idx3-ubyte.gz
Extracting ./MNIST_data/t10k-labels-idx1-ubyte.gz
(?, 14, 14, 32)
like image 98
大宝剑 Avatar answered Oct 13 '22 00:10

大宝剑


Nikola, it has been corrected as you thought.

  • Documentation fixes for TF Layers tutorial (see #8301)
  • Feedback on "A Guide to TF Layers: Building a Convolutional Neural Network" tutorial #8301

Learning the concept of convolution and pooling, I come across this thread. Thank you for your question, which takes me to the informative documentation.

like image 26
Tora Avatar answered Oct 13 '22 01:10

Tora