I was going through YOLOv4 paper which often uses the term one & two stage object detection. I was unable to understand what's the difference between the two types of object detectors. I am assuming <ul> <li>One stage does both region detection + object classification using one network only</li> <li>two stage does the above operations using 2 different networks</li> </ul> Is this assumption correct?

One-stage detectors: Object classification and bounding-box regression are done directly without using pre-generated region proposals (candidate object bounding-boxes). Two-stage detectors: <ol> <li> Generation of region proposals, e.g. by selective search as in R-CNN and Fast R-CNN, or by a Region Proposal Network (RPN) as in Faster R-CNN.</li> <li> Object classification for each region proposal. Additionally other things can be done such as bounding-box regression for refining the region proposals, binary-mask prediction etc.</li> </ol> Two-stage detectors usually reach better accuracy but are slower than one-stage detectors. <img src="https://i.stack.imgur.com/WYQp3.png" alt="enter image description here"> (image taken from "On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles Using Camera Data")

One stage vs two stage object detection

Tags:

artificial-intelligence

machine-learning

computer-vision

object-detection

yolo

I was going through YOLOv4 paper which often uses the term one & two stage object detection. I was unable to understand what's the difference between the two types of object detectors. I am assuming

One stage does both region detection + object classification using one network only
two stage does the above operations using 2 different networks

Is this assumption correct?

968

asked Jan 28 '21 17:01

Mehul Gupta

2 Answers

Instead of "region detection + object classification", its "(1)region proposal + (2)classification and localization in two stage detectors.

(1-region proposal) is done by what is called a Region Proposal Network (RPN, for short). RPN is used to decide “where” to look in order to reduce the computational requirements of the overall inference process. The RPN quickly and efficiently scans every location in order to assess whether further processing needs to be carried out in a given region. It does that by outputting k bounding box proposals each with 2 scores representing probability of object or not at each location. In other words, it is used to find up to a predefined number(~2000) of regions (bounding boxes), which may contain objects.

An important problem within object detection is generating a variable-length list of bounding boxes. The variable-length problem is solved in the RPN by using anchors: fixed sized reference bounding boxes which are placed uniformly throughout the original image. Instead of having to detect where objects are, we model the problem into two parts. For every anchor, we ask:

Does this anchor contain a relevant object?
How would we adjust this anchor to better fit the relevant object?

After having a list of possible relevant objects and their locations in the original image, it becomes a more straightforward problem to solve. Using the features extracted by the CNN and the bounding boxes with relevant objects, we apply Region of Interest (RoI) Pooling and extract those features which would correspond to the relevant objects into a new tensor.

Next in second stage, R-CNN module uses above information to:

Classify the content in the bounding box (or discard it, using “background” as a label).
Adjust the bounding box coordinates (so it better fits the object).

178

answered Oct 24 '22 06:10

Abhi25t

One-stage detectors:

Object classification and bounding-box regression are done directly without using pre-generated region proposals (candidate object bounding-boxes).

Two-stage detectors:

Generation of region proposals, e.g. by selective search as in R-CNN and Fast R-CNN, or by a Region Proposal Network (RPN) as in Faster R-CNN.
Object classification for each region proposal. Additionally other things can be done such as bounding-box regression for refining the region proposals, binary-mask prediction etc.

Two-stage detectors usually reach better accuracy but are slower than one-stage detectors.

enter image description here (image taken from "On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles Using Camera Data")

answered Oct 24 '22 06:10

Andreas K.

Related questions
                            
                                What is a good metric for feature vector comparison and how to normalize them before comparison?
                            
                                How to do machine learning when the inputs are of different sizes?
                            
                                Whats the difference between Cross-Entropy and Genetic Algorithms?
                            
                                How to train a machine learning algorithm using MFCC coefficient vectors?
                            
                                Dropconnect in Tensorflow
                            
                                Multiple Linear Regression Model by using Tensorflow
                            
                                Storing and using a trained neural network
                            
                                Python - tf-idf predict a new document similarity
                            
                                How can I use R to get confidence intervals in Azure ML? [closed]
                            
                                Named entity recognition (NER) features
                            
                                Keras does not utilize 100% cpu
                            
                                How can I optimize the calculation over this function in numpy?
                            
                                Python: Identifying undulating patterns in 1d distribution
                            
                                How vectorizer fit_transform work in sklearn?
                            
                                Machine Learning: normalize target var based on the impact of independent var
                            
                                Q-values exploding when training DQN
                            
                                How to decide threshold value in SelectFromModel() for selecting features?
                            
                                Add class information to Generator model in keras
                            
                                How do I train gpt 2 from scratch?
                            
                                Constraining a neural network's output to be within an arbitrary range

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With