Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does roi_align not seem to work in pytorch?

I am a pytorch beginner. It seems that there is a bug in the RoIAlign module in pytorch. The code is simple but the result is out of my expectation.

code:

import torch
from torchvision.ops import RoIAlign

if __name__ == '__main__':
    output_size = (3,3)
    spatial_scale = 1/4 
    sampling_ratio = 2  

    #x.shape:(1,1,6,6)
    x = torch.FloatTensor([[
        [[1,2,3,4,5,6],
        [7,8,9,10,11,12],
        [13,14,15,16,17,18],
        [19,20,21,22,23,24],
        [25,26,27,28,29,30],
        [31,32,33,34,35,36],],
    ]])

    rois = torch.tensor([
        [0,0.0,0.0,20.0,20.0],
    ])
    channel_num = x.shape[1]
    roi_num = rois.shape[0]

    a = RoIAlign(output_size, spatial_scale=spatial_scale, sampling_ratio=sampling_ratio)
    ya = a(x, rois)
    print(ya)

output:

tensor([[[[ 6.8333,  8.5000, 10.1667],
          [16.8333, 18.5000, 20.1667],
          [26.8333, 28.5000, 30.1667]]]])

But in this case shouldn't it be an average pooling operation on every 2x2 cell, like:

tensor([[[[ 4.5000,  6.5000, 8.5000],
          [16.5000, 18.5000, 20.5000],
          [28.5000, 30.5000, 32.5000]]]])

My torch version is 1.3.0 with python3.6 and cuda 10.1, on Ubuntu16. I have been troubled for two days and I couldn't appreciate it more if anyone could help me.

like image 217
sunshk1227 Avatar asked Feb 04 '20 14:02

sunshk1227


People also ask

How does RoI align work?

Region of Interest Align, or RoIAlign, is an operation for extracting a small feature map from each RoI in detection and segmentation based tasks. It removes the harsh quantization of RoI Pool, properly aligning the extracted features with the input.

Why does RoI align perform better than RoI pooling in mask R CNN?

The main difference between RoI Pooling and RoI Align is quantization. RoI Align is not using quantization for data pooling. You know that Fast R-CNN is applying quantization twice. First time in the mapping process and the second time during the pooling process.


1 Answers

Intuitive Interpretation

There are some complications with image coordinates. We need to take into account the fact that pixels are actually squares and not points in space. We interpret the center of the pixel to be the integer coordinates, so for example (0,0) refers to the center of the first pixel while (-0.5, -0.5) refers to the upper left corner of the first pixel. Basically this is why you aren't getting the results you expect. An roi that goes from (0,0) to (5,5) actually cuts through the border pixels and leads to sampling between pixels when performing roi align. If instead we define our roi from (-0.5, -0.5) to (5.5, 5.5) then we get the expected result. Accounting for the scale factor this translates to an roi from (-2, -2) to (22, 22).

import torch
from torchvision.ops import RoIAlign

output_size = (3, 3)
spatial_scale = 1 / 4
sampling_ratio = 2  

x = torch.FloatTensor([[
    [[1,  2,  3,  4,  5,  6 ],
     [7,  8,  9,  10, 11, 12],
     [13, 14, 15, 16, 17, 18],
     [19, 20, 21, 22, 23, 24],
     [25, 26, 27, 28, 29, 30],
     [31, 32, 33, 34, 35, 36]]
]])

rois = torch.tensor([
    [0, -2.0, -2.0, 22.0, 22.0],
])

a = RoIAlign(output_size, spatial_scale=spatial_scale, sampling_ratio=sampling_ratio)
ya = a(x, rois)
print(ya)

which results in

tensor([[[[ 4.5000,  6.5000,  8.5000],
          [16.5000, 18.5000, 20.5000],
          [28.5000, 30.5000, 32.5000]]]])

Alternative interpretation

Partitioning the interval [0, 5] into 3 intervals of equal length gives [0, 1.67], [1.67, 3.33], [3.33, 5]. So the boundaries of the output window will fall into these coordinates. Clearly this won't lead to nice sampling results.

like image 100
jodag Avatar answered Oct 31 '22 05:10

jodag