I was going through YOLOv4 paper which often uses the term one & two stage object detection. I was unable to understand what's the difference between the two types of object detectors. I am assuming
Is this assumption correct?
Multi-stage (Two-stage) object detection The task aims to draw multiple bounding boxes of objects in a given image, which is very important in many fields including autonomous driving. Generally, these object detection algorithms can be classified into two categories: Single-stage models and multi-stage models.
One-Stage Object Detection Models refer to a class of object detection models which are one-stage, i.e. models which skip the region proposal stage of two-stage models and run detection directly over a dense sampling of locations. These types of model usually have faster inference (possibly at the cost of performance).
YOLO works on the single-stage detection principle meaning it unifies all the components of the object detection pipeline into a single neural network. It uses the features from the entire image to predict class probabilities and bounding box coordinates.
After the image input, a backbone network is used for feature extraction. The module of Two-stage Object Detection is employed for separated calculation of object localization and classification, in contrast to that of One-stage Object Detection which, taking the computational speed into account combines the object localization and ... ...
One branch of object detectors is based on multi-stage models. Deriving from the work of R-CNN, one model is used to extract regions of objects, and a second model is used to classify and further refine the localization of the object.
that a boom in the field of two-stage object detection is observed from the years 2017 to 2020. The year 2020 is marked with the highest number of publications. Fig. 2. Yearly publication count for Two Stage Object Detection
high inference speeds and two-stage detectors have high localization and recognition accuracy. The two stages of a two-stage detector can be divided by a RoI (Region of Interest) Pooling layer. One of the prominent two-stage object detectors is Faster R-CNN. It has the first stage called RPN,
Instead of "region detection + object classification", its "(1)region proposal + (2)classification and localization in two stage detectors.
(1-region proposal) is done by what is called a Region Proposal Network (RPN, for short). RPN is used to decide “where” to look in order to reduce the computational requirements of the overall inference process. The RPN quickly and efficiently scans every location in order to assess whether further processing needs to be carried out in a given region. It does that by outputting k bounding box proposals each with 2 scores representing probability of object or not at each location. In other words, it is used to find up to a predefined number(~2000) of regions (bounding boxes), which may contain objects.
An important problem within object detection is generating a variable-length list of bounding boxes. The variable-length problem is solved in the RPN by using anchors: fixed sized reference bounding boxes which are placed uniformly throughout the original image. Instead of having to detect where objects are, we model the problem into two parts. For every anchor, we ask:
After having a list of possible relevant objects and their locations in the original image, it becomes a more straightforward problem to solve. Using the features extracted by the CNN and the bounding boxes with relevant objects, we apply Region of Interest (RoI) Pooling and extract those features which would correspond to the relevant objects into a new tensor.
Next in second stage, R-CNN module uses above information to:
One-stage detectors:
Object classification and bounding-box regression are done directly without using pre-generated region proposals (candidate object bounding-boxes).
Two-stage detectors:
Two-stage detectors usually reach better accuracy but are slower than one-stage detectors.
(image taken from "On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles Using Camera Data")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With