In this tutorial about object detection, the fast R-CNN is mentioned. The ROI (region of interest) layer is also mentioned. What is happening, mathematically, when region proposals get resized according to final convolution layer activation functions (in each cell)?

ROI (region of interest) layer is introduced in Fast R-CNN and is a special case of spatial pyramid pooling layer which is introduced in <a href="https://arxiv.org/pdf/1406.4729.pdf" rel="noreferrer">Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition</a>. The main function of ROI layer is reshape inputs with arbitrary size into a fixed length output because of size constraint in Fully Connected layers. How ROI layer works is showed below: <img src="https://i.stack.imgur.com/4uCN1.png" alt="enter image description here"> In this image, the input image with arbitrary size is fed into this layer which has 3 different window: 4x4 (blue), 2x2 (green), 1x1 (gray) to produce outputs with fixed size of 16 x F, 4 x F, and 1 x F, respectively, where F is the number of filters. Then, those outputs are concatenated into a vector to be fed to Fully Connected layer.

What is the purpose of the ROI layer in a Fast R-CNN?

2 Answers

Region-of-Interest(RoI) Pooling:

It is a type of pooling layer which performs max pooling on inputs (here, convnet feature maps) of non-uniform sizes and produces a small feature map of fixed size (say 7x7). The choice of this fixed size is a network hyper-parameter and is predefined.

The main purpose of doing such a pooling is to speed up the training and test time and also to train the whole system from end-to-end (in a joint manner).

It's because of the usage of this pooling layer the training & test time is faster compared to original(vanilla?) R-CNN architecture and hence the name Fast R-CNN.

Simple example (from Region of interest pooling explained by deepsense.io):

Visualization of RoI Pooling

answered Oct 17 '22 14:10

kmario23

ROI (region of interest) layer is introduced in Fast R-CNN and is a special case of spatial pyramid pooling layer which is introduced in Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. The main function of ROI layer is reshape inputs with arbitrary size into a fixed length output because of size constraint in Fully Connected layers.

How ROI layer works is showed below:

enter image description here

In this image, the input image with arbitrary size is fed into this layer which has 3 different window: 4x4 (blue), 2x2 (green), 1x1 (gray) to produce outputs with fixed size of 16 x F, 4 x F, and 1 x F, respectively, where F is the number of filters. Then, those outputs are concatenated into a vector to be fed to Fully Connected layer.

answered Oct 17 '22 13:10

Nghia Tran

Related questions
                            
                                Why doesn't my Deep Q Network master a simple Gridworld (Tensorflow)? (How to evaluate a Deep-Q-Net)
                            
                                How to use Batch Normalization correctly in tensorflow?
                            
                                Understanding Gradient Policy Deriving
                            
                                How to select batch size automatically to fit GPU?
                            
                                Does bias in the convolutional layer really make a difference to the test accuracy?
                            
                                How to understand masked multi-head attention in transformer
                            
                                caffe with multi-label images
                            
                                Understanding stateful LSTM [closed]
                            
                                How to decode encoded data from deep autoencoder in Keras (unclarity in tutorial)
                            
                                Keras for implement convolution neural network
                            
                                How to implement a deep bidirectional LSTM with Keras?
                            
                                PyTorch - How to get learning rate during training?
                            
                                What is a `"Python"` layer in caffe?
                            
                                Running the Tensorflow 2.0 code gives 'ValueError: tf.function-decorated function tried to create variables on non-first call'. What am I doing wrong?
                            
                                keras: what is the difference between model.predict and model.predict_proba
                            
                                Data Augmentation Image Data Generator Keras Semantic Segmentation
                            
                                Pytorch. How does pin_memory work in Dataloader?
                            
                                How to display custom images in TensorBoard using Keras?
                            
                                Choosing number of Steps per Epoch
                            
                                What is the difference between the predict and predict_on_batch methods of a Keras model?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the purpose of the ROI layer in a Fast R-CNN?

Tags:

deep-learning

computer-vision

conv-neural-network

object-detection

Shamane Siriwardhana

People also ask

2 Answers

kmario23

Nghia Tran

Recent Activity

Donate For Us