Fast RCNN - ROI projection

Tags:

deep-learning

In the Fast RCNN approach, region proposals in the original image are projected onto the output of the final convolutional feature map. In the case of the VGG net, the input image is of size 224 x 244 and the final output of the convolutional feature map 14 x 14 x 512.

Does this mean that proposals on the input image are projected onto the feature map for ROI pooling ? Is the projection a simple scaling of the bounding box ?

912

asked Dec 02 '16 04:12

Kong

1 Answers

This article gives a good description of RoI pooling and how you get the RoI BB equivalent for the feature map from the original label.

https://medium.com/datadriveninvestor/review-on-fast-rcnn-202c9eadd23b

Basically, the goal of RoI pooling is to output a fixed size feature map from an arbitrary size section of the CNN output feature map.

To do this, you have to do RoI projection to translate the RoI BB (x,y,h,w) from the original image to the RoI BB you need in the feature map. This is done by scaling it based on the sub-sampling ratio.

Ex.)

If your image is 18x18 and your feature map is 3x3 then your sub-sampling ratio is 3/18.
To get your projected RoI BB, then you multiply that by your original BB values like x' = (3/18)x

Then you just do the pooling on that section of the feature map, with an H×W number of pooling windows with sizes ~h'/H×w'/W where H and W are the height and width of your target output for the pooling layer.

The article gives a much better description and I encourage you to check it out and the original paper!

answered Oct 14 '22 03:10

SpaceDandy

Related questions
                            
                                How to choose which pre-trained weights to use for my model?
                            
                                How to create caffe.deploy from train.prototxt
                            
                                Input image dtype is bool. Interpolation is not defined with bool data type
                            
                                Cannot convert a symbolic Keras input/output to a numpy array TypeError when using sampled_softmax in tensorflow 2.4
                            
                                Theano Import error
                            
                                Why doesn't my simple pytorch network work on GPU device?
                            
                                ctc_loss error "No valid path found."
                            
                                Keras- Embedding layer
                            
                                Difference between tf.layers.conv2d and tf.contrib.slim.conv2d
                            
                                This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical
                            
                                How to get output of hidden layer given an input, weights and biases of the hidden layer in keras?
                            
                                Some Python objects were not bound to checkpointed values
                            
                                ImportError: cannot import name 'keras_tensor' from 'tensorflow.python.keras.engine'
                            
                                Keras Neural Nets, How to remove NaN values in output? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With