Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does adaptive pooling in pytorch work?

Tags:

python

pytorch

Adaptive pooling is a great function, but how does it work? It seems to be inserting pads or shrinking/expanding kernel sizes in what seems like a pattered but fairly arbitrary way. The pytorch documentation I can find is not more descriptive than "put desired output size here." Does anyone know how this works or can point to where it's explained?

Some test code on a 1x1x6 tensor, (1,2,3,4,5,6), with an adaptive output of size 8:

import torch import torch.nn as nn  class TestNet(nn.Module):     def __init__(self):         super(TestNet, self).__init__()         self.avgpool = nn.AdaptiveAvgPool1d(8)      def forward(self,x):         print(x)         x = self.avgpool(x)         print(x)         return x  def test():     x = torch.Tensor([[[1,2,3,4,5,6]]])     net = TestNet()     y = net(x)     return y  test() 

Output:

tensor([[[ 1.,  2.,  3.,  4.,  5.,  6.]]]) tensor([[[ 1.0000,  1.5000,  2.5000,  3.0000,  4.0000,  4.5000,  5.5000,        6.0000]]]) 

If it mirror pads by on the left and right (operating on (1,1,2,3,4,5,6,6)), and has a kernel of 2, then the outputs for all positions except for 4 and 5 make sense, except of course the output isn't the right size. Is it also padding the 3 and 4 internally? If so, it's operating on (1,1,2,3,3,4,4,5,6,6), which, if using a size 2 kernel, produces the wrong output size and would also miss a 3.5 output. Is it changing the size of the kernel?

Am I missing something obvious about the way this works?

like image 648
S C Avatar asked Dec 18 '18 21:12

S C


People also ask

What is AdaptiveAvgPool2d?

AdaptiveAvgPool2d (output_size)[source] Applies a 2D adaptive average pooling over an input signal composed of several input planes. The output is of size H x W, for any input size. The number of output features is equal to the number of input planes. output_size – the target output size of the image of the form H x W.

What is mean pooling PyTorch?

Global average pooling means that you average each feature map separately. In your case if the feature map is of dimension 8 x 8 , you average each and obtain a single value. The important part here is that you do the average operation per-channel.


2 Answers

In general, pooling reduces dimensions. If you want to increase dimensions, you might want to look at interpolation.

Anyway, let's talk about adaptive pooling in general. You can look at the source code here. Some claimed that adaptive pooling is the same as standard pooling with stride and kernel size calculated from input and output size. Specifically, the following parameters are used:

  1. Stride = (input_size//output_size)
  2. Kernel size = input_size - (output_size-1)*stride
  3. Padding = 0

These are inversely worked from the pooling formula. While they DO produce output of the desired size, its output is not necessarily the same as that of adaptive pooling. Here is a test snippet:

import torch import torch.nn as nn  in_length = 5 out_length = 3  x = torch.arange(0, in_length).view(1, 1, -1).float() print(x)  stride = (in_length//out_length) avg_pool = nn.AvgPool1d(         stride=stride,         kernel_size=(in_length-(out_length-1)*stride),         padding=0,     ) adaptive_pool = nn.AdaptiveAvgPool1d(out_length)  print(avg_pool.stride, avg_pool.kernel_size)  y_avg = avg_pool(x) y_ada = adaptive_pool(x)  print(y_avg) print(y_ada) 

Output:

tensor([[[0., 1., 2., 3., 4.]]]) (1,) (3,) tensor([[[1., 2., 3.]]]) tensor([[[0.5000, 2.0000, 3.5000]]]) Error:  1.0 

Average pooling pools from elements (0, 1, 2), (1, 2, 3) and (2, 3, 4).

Adaptive pooling pools from elements (0, 1), (1, 2, 3) and (3, 4). (Change the code a bit to see that it is not pooling from (2) only)

  • You can tell adaptive pooling tries to reduce overlapping in pooling.
  • The difference can be mitigated using padding with count_include_pad=True, but in general I don't think they can be exactly the same for 2D or higher for all input/output sizes. I would imagine using different paddings for left/right. This is not supported in pooling layers for the moment.
  • From a practical perspective it should not matter much.
  • Check the code for actual implementation.
like image 171
hkchengrex Avatar answered Oct 06 '22 18:10

hkchengrex


As hkchengrex's answer points out, the PyTorch documentation does not explain what rule is used by adaptive pooling layers to determine the size and locations of the pooling kernels. (In fact, there is a fixme in the PyTorch code indicating the documentation needs to be improved.)

However, the calculation of the kernel sizes and locations is implemented by this cpp function and the key logic is actually in the calls to the functions start_index and end_index, which define the location and offset of the kernels.

I believe this Python code re-implements that code and shows how kernels are calculated:

from typing import List import math def kernels(ind,outd) -> List:     """Returns a List [(kernel_offset_start,kernel_length)] defining all the pooling kernels for a 1-D adaptive pooling layer that takes an input of dimension `ind` and yields an output of dimension `outd`"""     def start_index(a,b,c):         return math.floor((float(a) * float(c)) / b)     def end_index(a,b,c):         return math.ceil((float(a + 1) * float(c)) / b)     results = []     for ow in range(outd):         start = start_index(ow,outd,ind)         end = end_index(ow,outd,ind)         sz = end - start         results.append((start,sz))     return results  def kernel_indexes(ind,out) -> List:     """Returns a List [[*ind]] containing the indexes of the pooling kernels"""     startsLengths = kernels(ind,out)     return [list(range(start,start+length)) for (start,length) in startsLengths] 

Here are the key points to notice.

First, it matters a lot whether the input dimension (ind) is an integer multiple of the output dimension (outd).

Second, when this is the case, then the adaptive layer's kernels are equally-sized and non-overlapping, and are exactly what would be produced by defining kernels and a stride based on the following rule:

stride = ind // outd kernel_size = ind - (outd-1)*stride padding = 0 

In other words, in this case it is possible to reproduce the effect of an adaptive pooling layer by using a non-adaptive pooling layer defined with suitable stride, kernel_size, and padding. (Example further below.)

Finally, when instead it is the case that the input size is not an integer multiple of the output size, then PyTorch's adaptive pooling rule produces kernels which overlap and are of variable size.

Since the non-adaptive pooling API does not allow for variably-sized kernels, in this case it seems to me there is no way to reproduce the effect of adaptive pooling by feeding suitable values into a non-adaptive pooling layer.

Here's an example which shows both cases. This helper function lets us compare what's happening with adapative average pooling layer and an ordinary average pooling layer which uses fixed stride and kernel:

import torch import torch.nn as nn  def compare1DAdaptivity(ind,outd,inputpattern):     c = 1     padding = 0      input = torch.Tensor(inputpattern).view(1,c,ind)      stride = ind // outd     kernel_size = (ind - (outd-1)*stride)     avg_pool = nn.AvgPool1d(stride=stride,kernel_size=kernel_size,padding=padding)     avg_out = avg_pool(input)      adap_avg_pool = torch.nn.AdaptiveAvgPool1d(outd)     adap_avg_out = adap_avg_pool(input)          try:         equal_output = torch.allclose(avg_out,adap_avg_out)     except:         equal_output = False      print("input.shape: {}".format(input.shape))     print("in_dims: {}".format(ind))     print("out_dims: {}".format(outd))     print("")     print("AAL strides: {}".format(stride))     print("AAL kernel_sizes: {}".format(kernel_size))     print("AAL pad: {}".format(padding))     print("")     print("outputs equal: {}".format(equal_output))     print("")     print("AAL input -> output: {} -> {}".format(input,avg_out))     print("adap input -> output: {} -> {}".format(input,adap_avg_out))     return equal_output 

So, to give an example of the first case, where the input dimension is a multiple of the output dimension, we can go from 6 to 3. We can see that the approximate adaptive layer and the true adaptive layer give the same output:

compare1DAdaptivity(6,3,[1,0,0,0,0]) # => Tue AAL input -> output: tensor([[[1., 0., 0., 0., 0., 0.]]]) -> tensor([[[0.5000, 0.0000, 0.0000]]]) adap input -> output: tensor([[[1., 0., 0., 0., 0., 0.]]]) -> tensor([[[0.5000, 0.0000, 0.0000]]]) 

However, this no longer works if we go from 5 to 3.

compare1DAdaptivity(5,3,[1,0,0,0,0]) # => False AAL input -> output: tensor([[[1., 0., 0., 0., 0.]]]) -> tensor([[[0.3333, 0.0000, 0.0000]]]) adap input -> output: tensor([[[1., 0., 0., 0., 0.]]]) -> tensor([[[0.5000, 0.0000, 0.0000]]]) 

But we can reproduce the result of the adaptive layers by manually computing over the indexes:

t = [1,0,0,0,0]; [sum( [t[x] for x in xs] ) / len(xs) for xs in kernel_indexes(5,3)] # => [0.5,0.0,0.0] 
like image 22
algal Avatar answered Oct 06 '22 18:10

algal