Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the purpose of the ROI layer in a Fast R-CNN?

In this tutorial about object detection, the fast R-CNN is mentioned. The ROI (region of interest) layer is also mentioned.

What is happening, mathematically, when region proposals get resized according to final convolution layer activation functions (in each cell)?

like image 336
Shamane Siriwardhana Avatar asked Apr 15 '17 18:04

Shamane Siriwardhana


People also ask

What is ROI in faster R-CNN?

In Fast R-CNN, the image is fed to the underlying CNN just once and the selective search is run on the other hand as usual. These region proposals generated by Selective Search are then projected on to the feature maps generated by the CNN. This process is called ROI Projection(Region Of Interest).

What is ROI in mask R-CNN?

in Mask R-CNN. Region of Interest Align, or RoIAlign, is an operation for extracting a small feature map from each RoI in detection and segmentation based tasks. It removes the harsh quantization of RoI Pool, properly aligning the extracted features with the input.

What is ROI in object detection?

Region of interest pooling (also known as RoI pooling) is an operation widely used in object detection tasks using convolutional neural networks. For example, to detect multiple cars and pedestrians in a single image.

How many layers are in faster R-CNN?

In the original Faster R-CNN paper, the R-CNN takes the feature map for each proposal, flattens it and uses two fully-connected layers of size 4096 with ReLU activation.


2 Answers

Region-of-Interest(RoI) Pooling:

It is a type of pooling layer which performs max pooling on inputs (here, convnet feature maps) of non-uniform sizes and produces a small feature map of fixed size (say 7x7). The choice of this fixed size is a network hyper-parameter and is predefined.

The main purpose of doing such a pooling is to speed up the training and test time and also to train the whole system from end-to-end (in a joint manner).

It's because of the usage of this pooling layer the training & test time is faster compared to original(vanilla?) R-CNN architecture and hence the name Fast R-CNN.

Simple example (from Region of interest pooling explained by deepsense.io):

Visualization of RoI Pooling

like image 60
kmario23 Avatar answered Oct 17 '22 14:10

kmario23


ROI (region of interest) layer is introduced in Fast R-CNN and is a special case of spatial pyramid pooling layer which is introduced in Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. The main function of ROI layer is reshape inputs with arbitrary size into a fixed length output because of size constraint in Fully Connected layers.

How ROI layer works is showed below:

enter image description here

In this image, the input image with arbitrary size is fed into this layer which has 3 different window: 4x4 (blue), 2x2 (green), 1x1 (gray) to produce outputs with fixed size of 16 x F, 4 x F, and 1 x F, respectively, where F is the number of filters. Then, those outputs are concatenated into a vector to be fed to Fully Connected layer.

like image 18
Nghia Tran Avatar answered Oct 17 '22 13:10

Nghia Tran