Why does roi_align not seem to work in pytorch?

Tags:

I am a pytorch beginner. It seems that there is a bug in the RoIAlign module in pytorch. The code is simple but the result is out of my expectation.

code:

import torch
from torchvision.ops import RoIAlign

if __name__ == '__main__':
    output_size = (3,3)
    spatial_scale = 1/4 
    sampling_ratio = 2  

    #x.shape:(1,1,6,6)
    x = torch.FloatTensor([[
        [[1,2,3,4,5,6],
        [7,8,9,10,11,12],
        [13,14,15,16,17,18],
        [19,20,21,22,23,24],
        [25,26,27,28,29,30],
        [31,32,33,34,35,36],],
    ]])

    rois = torch.tensor([
        [0,0.0,0.0,20.0,20.0],
    ])
    channel_num = x.shape[1]
    roi_num = rois.shape[0]

    a = RoIAlign(output_size, spatial_scale=spatial_scale, sampling_ratio=sampling_ratio)
    ya = a(x, rois)
    print(ya)

output:

tensor([[[[ 6.8333,  8.5000, 10.1667],
          [16.8333, 18.5000, 20.1667],
          [26.8333, 28.5000, 30.1667]]]])

But in this case shouldn't it be an average pooling operation on every 2x2 cell, like:

tensor([[[[ 4.5000,  6.5000, 8.5000],
          [16.5000, 18.5000, 20.5000],
          [28.5000, 30.5000, 32.5000]]]])

My torch version is 1.3.0 with python3.6 and cuda 10.1, on Ubuntu16. I have been troubled for two days and I couldn't appreciate it more if anyone could help me.

217

asked Feb 04 '20 14:02

sunshk1227

1 Answers

Intuitive Interpretation

There are some complications with image coordinates. We need to take into account the fact that pixels are actually squares and not points in space. We interpret the center of the pixel to be the integer coordinates, so for example (0,0) refers to the center of the first pixel while (-0.5, -0.5) refers to the upper left corner of the first pixel. Basically this is why you aren't getting the results you expect. An roi that goes from (0,0) to (5,5) actually cuts through the border pixels and leads to sampling between pixels when performing roi align. If instead we define our roi from (-0.5, -0.5) to (5.5, 5.5) then we get the expected result. Accounting for the scale factor this translates to an roi from (-2, -2) to (22, 22).

import torch
from torchvision.ops import RoIAlign

output_size = (3, 3)
spatial_scale = 1 / 4
sampling_ratio = 2  

x = torch.FloatTensor([[
    [[1,  2,  3,  4,  5,  6 ],
     [7,  8,  9,  10, 11, 12],
     [13, 14, 15, 16, 17, 18],
     [19, 20, 21, 22, 23, 24],
     [25, 26, 27, 28, 29, 30],
     [31, 32, 33, 34, 35, 36]]
]])

rois = torch.tensor([
    [0, -2.0, -2.0, 22.0, 22.0],
])

a = RoIAlign(output_size, spatial_scale=spatial_scale, sampling_ratio=sampling_ratio)
ya = a(x, rois)
print(ya)

which results in

tensor([[[[ 4.5000,  6.5000,  8.5000],
          [16.5000, 18.5000, 20.5000],
          [28.5000, 30.5000, 32.5000]]]])

Alternative interpretation

Partitioning the interval [0, 5] into 3 intervals of equal length gives [0, 1.67], [1.67, 3.33], [3.33, 5]. So the boundaries of the output window will fall into these coordinates. Clearly this won't lead to nice sampling results.

100

answered Oct 31 '22 05:10

jodag

Related questions
                            
                                Why do we use fully-connected layer at the end of CNN?
                            
                                How to convert black and white image to array with 3 dimensions in python?
                            
                                Simple object recognition
                            
                                Background subtraction marks shadows as foreground
                            
                                People Detection and Tracking
                            
                                Fully convert a black and white image to a set of lines (aka vectorize using only lines)
                            
                                Computing the 3D Transformation between Two Sets of Points
                            
                                Gradient Descent vs Stochastic Gradient Descent algorithms
                            
                                What is the difference between gradient and imgradient?
                            
                                How to convert an OpenCV cv::Mat into a float* that can be fed into Vlfeat vl_dsift_process ?
                            
                                Opencv 2.4.2 Code Explanation-Face Recognition
                            
                                Stereo Disparity map generation
                            
                                Check for areas that are too thin in an image
                            
                                How to correct uneven illumination in images using MATLAB?
                            
                                Setting ROI with mouse from a rectangle on a video
                            
                                minMaxLoc datatype
                            
                                cv2.threshold() error (-210)
                            
                                Segmenting characters from Image
                            
                                Transform an image to a bitmap
                            
                                Paper currency recognition by image processing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does roi_align not seem to work in pytorch?

Tags:

computer-vision

pytorch

object-detection

faster-rcnn

sunshk1227

People also ask

1 Answers

jodag

Recent Activity

Donate For Us