How does adaptive pooling in pytorch work?

Tags:

Adaptive pooling is a great function, but how does it work? It seems to be inserting pads or shrinking/expanding kernel sizes in what seems like a pattered but fairly arbitrary way. The pytorch documentation I can find is not more descriptive than "put desired output size here." Does anyone know how this works or can point to where it's explained?

Some test code on a 1x1x6 tensor, (1,2,3,4,5,6), with an adaptive output of size 8:

import torch import torch.nn as nn  class TestNet(nn.Module):     def __init__(self):         super(TestNet, self).__init__()         self.avgpool = nn.AdaptiveAvgPool1d(8)      def forward(self,x):         print(x)         x = self.avgpool(x)         print(x)         return x  def test():     x = torch.Tensor([[[1,2,3,4,5,6]]])     net = TestNet()     y = net(x)     return y  test()

Output:

tensor([[[ 1.,  2.,  3.,  4.,  5.,  6.]]]) tensor([[[ 1.0000,  1.5000,  2.5000,  3.0000,  4.0000,  4.5000,  5.5000,        6.0000]]])

If it mirror pads by on the left and right (operating on (1,1,2,3,4,5,6,6)), and has a kernel of 2, then the outputs for all positions except for 4 and 5 make sense, except of course the output isn't the right size. Is it also padding the 3 and 4 internally? If so, it's operating on (1,1,2,3,3,4,4,5,6,6), which, if using a size 2 kernel, produces the wrong output size and would also miss a 3.5 output. Is it changing the size of the kernel?

Am I missing something obvious about the way this works?

648

asked Dec 18 '18 21:12

S C

2 Answers

In general, pooling reduces dimensions. If you want to increase dimensions, you might want to look at interpolation.

Anyway, let's talk about adaptive pooling in general. You can look at the source code here. Some claimed that adaptive pooling is the same as standard pooling with stride and kernel size calculated from input and output size. Specifically, the following parameters are used:

Stride = (input_size//output_size)
Kernel size = input_size - (output_size-1)*stride
Padding = 0

These are inversely worked from the pooling formula. While they DO produce output of the desired size, its output is not necessarily the same as that of adaptive pooling. Here is a test snippet:

import torch import torch.nn as nn  in_length = 5 out_length = 3  x = torch.arange(0, in_length).view(1, 1, -1).float() print(x)  stride = (in_length//out_length) avg_pool = nn.AvgPool1d(         stride=stride,         kernel_size=(in_length-(out_length-1)*stride),         padding=0,     ) adaptive_pool = nn.AdaptiveAvgPool1d(out_length)  print(avg_pool.stride, avg_pool.kernel_size)  y_avg = avg_pool(x) y_ada = adaptive_pool(x)  print(y_avg) print(y_ada)

Output:

tensor([[[0., 1., 2., 3., 4.]]]) (1,) (3,) tensor([[[1., 2., 3.]]]) tensor([[[0.5000, 2.0000, 3.5000]]]) Error:  1.0

Average pooling pools from elements (0, 1, 2), (1, 2, 3) and (2, 3, 4).

Adaptive pooling pools from elements (0, 1), (1, 2, 3) and (3, 4). (Change the code a bit to see that it is not pooling from (2) only)

You can tell adaptive pooling tries to reduce overlapping in pooling.
The difference can be mitigated using padding with count_include_pad=True, but in general I don't think they can be exactly the same for 2D or higher for all input/output sizes. I would imagine using different paddings for left/right. This is not supported in pooling layers for the moment.
From a practical perspective it should not matter much.
Check the code for actual implementation.

171

answered Oct 06 '22 18:10

hkchengrex

As hkchengrex's answer points out, the PyTorch documentation does not explain what rule is used by adaptive pooling layers to determine the size and locations of the pooling kernels. (In fact, there is a fixme in the PyTorch code indicating the documentation needs to be improved.)

However, the calculation of the kernel sizes and locations is implemented by this cpp function and the key logic is actually in the calls to the functions start_index and end_index, which define the location and offset of the kernels.

I believe this Python code re-implements that code and shows how kernels are calculated:

from typing import List import math def kernels(ind,outd) -> List:     """Returns a List [(kernel_offset_start,kernel_length)] defining all the pooling kernels for a 1-D adaptive pooling layer that takes an input of dimension `ind` and yields an output of dimension `outd`"""     def start_index(a,b,c):         return math.floor((float(a) * float(c)) / b)     def end_index(a,b,c):         return math.ceil((float(a + 1) * float(c)) / b)     results = []     for ow in range(outd):         start = start_index(ow,outd,ind)         end = end_index(ow,outd,ind)         sz = end - start         results.append((start,sz))     return results  def kernel_indexes(ind,out) -> List:     """Returns a List [[*ind]] containing the indexes of the pooling kernels"""     startsLengths = kernels(ind,out)     return [list(range(start,start+length)) for (start,length) in startsLengths]

Here are the key points to notice.

First, it matters a lot whether the input dimension (ind) is an integer multiple of the output dimension (outd).

Second, when this is the case, then the adaptive layer's kernels are equally-sized and non-overlapping, and are exactly what would be produced by defining kernels and a stride based on the following rule:

stride = ind // outd kernel_size = ind - (outd-1)*stride padding = 0

In other words, in this case it is possible to reproduce the effect of an adaptive pooling layer by using a non-adaptive pooling layer defined with suitable stride, kernel_size, and padding. (Example further below.)

Finally, when instead it is the case that the input size is not an integer multiple of the output size, then PyTorch's adaptive pooling rule produces kernels which overlap and are of variable size.

Since the non-adaptive pooling API does not allow for variably-sized kernels, in this case it seems to me there is no way to reproduce the effect of adaptive pooling by feeding suitable values into a non-adaptive pooling layer.

Here's an example which shows both cases. This helper function lets us compare what's happening with adapative average pooling layer and an ordinary average pooling layer which uses fixed stride and kernel:

import torch import torch.nn as nn  def compare1DAdaptivity(ind,outd,inputpattern):     c = 1     padding = 0      input = torch.Tensor(inputpattern).view(1,c,ind)      stride = ind // outd     kernel_size = (ind - (outd-1)*stride)     avg_pool = nn.AvgPool1d(stride=stride,kernel_size=kernel_size,padding=padding)     avg_out = avg_pool(input)      adap_avg_pool = torch.nn.AdaptiveAvgPool1d(outd)     adap_avg_out = adap_avg_pool(input)          try:         equal_output = torch.allclose(avg_out,adap_avg_out)     except:         equal_output = False      print("input.shape: {}".format(input.shape))     print("in_dims: {}".format(ind))     print("out_dims: {}".format(outd))     print("")     print("AAL strides: {}".format(stride))     print("AAL kernel_sizes: {}".format(kernel_size))     print("AAL pad: {}".format(padding))     print("")     print("outputs equal: {}".format(equal_output))     print("")     print("AAL input -> output: {} -> {}".format(input,avg_out))     print("adap input -> output: {} -> {}".format(input,adap_avg_out))     return equal_output

So, to give an example of the first case, where the input dimension is a multiple of the output dimension, we can go from 6 to 3. We can see that the approximate adaptive layer and the true adaptive layer give the same output:

compare1DAdaptivity(6,3,[1,0,0,0,0]) # => Tue AAL input -> output: tensor([[[1., 0., 0., 0., 0., 0.]]]) -> tensor([[[0.5000, 0.0000, 0.0000]]]) adap input -> output: tensor([[[1., 0., 0., 0., 0., 0.]]]) -> tensor([[[0.5000, 0.0000, 0.0000]]])

However, this no longer works if we go from 5 to 3.

compare1DAdaptivity(5,3,[1,0,0,0,0]) # => False AAL input -> output: tensor([[[1., 0., 0., 0., 0.]]]) -> tensor([[[0.3333, 0.0000, 0.0000]]]) adap input -> output: tensor([[[1., 0., 0., 0., 0.]]]) -> tensor([[[0.5000, 0.0000, 0.0000]]])

But we can reproduce the result of the adaptive layers by manually computing over the indexes:

t = [1,0,0,0,0]; [sum( [t[x] for x in xs] ) / len(xs) for xs in kernel_indexes(5,3)] # => [0.5,0.0,0.0]

answered Oct 06 '22 18:10

algal

Related questions
                            
                                Remove an imported python module [duplicate]
                            
                                Python: Passing parameters by name along with kwargs
                            
                                Cython: cimport and import numpy as (both) np
                            
                                How can modify request.data in django REST framework
                            
                                Difference between io.open vs open in python
                            
                                Where do I find the python standard library code?
                            
                                Python: why pickle?
                            
                                PIP: Installing only the dependencies
                            
                                How do I use the unittest setUpClass method()?
                            
                                How do I stack vectors of different lengths in NumPy?
                            
                                Is list[i:j] guaranteed to be an empty list if list[j] precedes list[i]?
                            
                                Get json data via url and use in python (simplejson)
                            
                                Where is WebDriver's Python API Documentation? [closed]
                            
                                PyCharm Not Properly Recognizing Requirements - Python, Django
                            
                                How to read bytes as stream in python 3
                            
                                Semaphores on Python
                            
                                Python-pdb skip code (as in "not execute")
                            
                                AKS Primes algorithm in Python
                            
                                Using Sklearn's TfidfVectorizer transform
                            
                                How I call an async function without await?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does adaptive pooling in pytorch work?

Tags:

python

pytorch

S C

People also ask

2 Answers

hkchengrex

algal

Recent Activity

Donate For Us