Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast RCNN - ROI projection

In the Fast RCNN approach, region proposals in the original image are projected onto the output of the final convolutional feature map. In the case of the VGG net, the input image is of size 224 x 244 and the final output of the convolutional feature map 14 x 14 x 512.

Does this mean that proposals on the input image are projected onto the feature map for ROI pooling ? Is the projection a simple scaling of the bounding box ?

like image 912
Kong Avatar asked Dec 02 '16 04:12

Kong


People also ask

What is ROI projection?

Actual ROI. Anticipated ROI, or expected ROI, is calculated before a project kicks off, and is often used to determine if that project makes sense to pursue. Anticipated ROI uses estimated costs, revenues, and other assumptions to determine how much profit a project is likely to generate.

How does ROI pooling work?

ROI pooling solves the problem of fixed image size requirement for object detection network. ROI pooling produces the fixed-size feature maps from non-uniform inputs by doing max-pooling on the inputs. The number of output channels is equal to the number of input channels for this layer.

Why Yolo is faster than RCNN?

YOLO stands for You Only Look Once. In practical it runs a lot faster than faster rcnn due it's simpler architecture. Unlike faster RCNN, it's trained to do classification and bounding box regression at the same time.


1 Answers

This article gives a good description of RoI pooling and how you get the RoI BB equivalent for the feature map from the original label.

https://medium.com/datadriveninvestor/review-on-fast-rcnn-202c9eadd23b

Basically, the goal of RoI pooling is to output a fixed size feature map from an arbitrary size section of the CNN output feature map.

To do this, you have to do RoI projection to translate the RoI BB (x,y,h,w) from the original image to the RoI BB you need in the feature map. This is done by scaling it based on the sub-sampling ratio.

Ex.)

  • If your image is 18x18 and your feature map is 3x3 then your sub-sampling ratio is 3/18.
  • To get your projected RoI BB, then you multiply that by your original BB values like x' = (3/18)x

Then you just do the pooling on that section of the feature map, with an H×W number of pooling windows with sizes ~h'/H×w'/W where H and W are the height and width of your target output for the pooling layer.

The article gives a much better description and I encourage you to check it out and the original paper!

like image 67
SpaceDandy Avatar answered Oct 14 '22 03:10

SpaceDandy