Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

One stage vs two stage object detection

I was going through YOLOv4 paper which often uses the term one & two stage object detection. I was unable to understand what's the difference between the two types of object detectors. I am assuming

  • One stage does both region detection + object classification using one network only
  • two stage does the above operations using 2 different networks

Is this assumption correct?

like image 968
Mehul Gupta Avatar asked Jan 28 '21 17:01

Mehul Gupta


People also ask

Why we still use a two-stage object detector?

Multi-stage (Two-stage) object detection The task aims to draw multiple bounding boxes of objects in a given image, which is very important in many fields including autonomous driving. Generally, these object detection algorithms can be classified into two categories: Single-stage models and multi-stage models.

What is single-stage object detector?

One-Stage Object Detection Models refer to a class of object detection models which are one-stage, i.e. models which skip the region proposal stage of two-stage models and run detection directly over a dense sampling of locations. These types of model usually have faster inference (possibly at the cost of performance).

Is Yolo one-stage or two-stage?

YOLO works on the single-stage detection principle meaning it unifies all the components of the object detection pipeline into a single neural network. It uses the features from the entire image to predict class probabilities and bounding box coordinates.

What is the difference between one-stage and two-stage object detection?

After the image input, a backbone network is used for feature extraction. The module of Two-stage Object Detection is employed for separated calculation of object localization and classification, in contrast to that of One-stage Object Detection which, taking the computational speed into account combines the object localization and ... ...

How are object detectors based on multi stage models?

One branch of object detectors is based on multi-stage models. Deriving from the work of R-CNN, one model is used to extract regions of objects, and a second model is used to classify and further refine the localization of the object.

Is there a boom in the field of two-stage object detection?

that a boom in the field of two-stage object detection is observed from the years 2017 to 2020. The year 2020 is marked with the highest number of publications. Fig. 2. Yearly publication count for Two Stage Object Detection

What is the difference between high inference speeds and two-stage detectors?

high inference speeds and two-stage detectors have high localization and recognition accuracy. The two stages of a two-stage detector can be divided by a RoI (Region of Interest) Pooling layer. One of the prominent two-stage object detectors is Faster R-CNN. It has the first stage called RPN,


2 Answers

Instead of "region detection + object classification", its "(1)region proposal + (2)classification and localization in two stage detectors.

(1-region proposal) is done by what is called a Region Proposal Network (RPN, for short). RPN is used to decide “where” to look in order to reduce the computational requirements of the overall inference process. The RPN quickly and efficiently scans every location in order to assess whether further processing needs to be carried out in a given region. It does that by outputting k bounding box proposals each with 2 scores representing probability of object or not at each location. In other words, it is used to find up to a predefined number(~2000) of regions (bounding boxes), which may contain objects.

An important problem within object detection is generating a variable-length list of bounding boxes. The variable-length problem is solved in the RPN by using anchors: fixed sized reference bounding boxes which are placed uniformly throughout the original image. Instead of having to detect where objects are, we model the problem into two parts. For every anchor, we ask:

  • Does this anchor contain a relevant object?
  • How would we adjust this anchor to better fit the relevant object?

After having a list of possible relevant objects and their locations in the original image, it becomes a more straightforward problem to solve. Using the features extracted by the CNN and the bounding boxes with relevant objects, we apply Region of Interest (RoI) Pooling and extract those features which would correspond to the relevant objects into a new tensor.

Next in second stage, R-CNN module uses above information to:

  • Classify the content in the bounding box (or discard it, using “background” as a label).
  • Adjust the bounding box coordinates (so it better fits the object).
like image 178
Abhi25t Avatar answered Oct 24 '22 06:10

Abhi25t


One-stage detectors:

Object classification and bounding-box regression are done directly without using pre-generated region proposals (candidate object bounding-boxes).

Two-stage detectors:

  1. Generation of region proposals, e.g. by selective search as in R-CNN and Fast R-CNN, or by a Region Proposal Network (RPN) as in Faster R-CNN.
  2. Object classification for each region proposal. Additionally other things can be done such as bounding-box regression for refining the region proposals, binary-mask prediction etc.

Two-stage detectors usually reach better accuracy but are slower than one-stage detectors.

enter image description here (image taken from "On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles Using Camera Data")

like image 40
Andreas K. Avatar answered Oct 24 '22 06:10

Andreas K.