Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

anchor box or bounding boxes in Yolo or Faster RCNN

I don't know the difference between anchor box and bounding boxes, or proposal area. I am confused with these definitions. And I don't know the meaning of these boxes in the detection model, since the default length never changes! And finally, I confuse with the fact that RCNN series and Yolo series both output the prediction boxes location (x,y,w,h). Or output the delta position (ground truth_x - predicted_x)/prediction_w?

like image 914
Luv Avatar asked May 21 '18 14:05

Luv


People also ask

Does faster RCNN use anchor boxes?

Faster RCNN uses anchor boxes of 3 aspect ratios and 3 scales. Thus for each pixel in the feature map, there are 9 anchor boxes. The architecture is a simple convolution layer with kernel size 3*3 followed by two fully connected layers(one for objectness score(classification) and other for regression of proposals).

Does Yolo use anchor boxes?

In order to predict and localize many different objects in an image, most state of the art object detection models such as EfficientDet and the YOLO models start with anchor boxes as a prior, and adjust from there.

Why anchor boxes are used in Yolo?

What are anchor boxes? YOLO can work well for multiple objects where each object is associated with one grid cell. But in the case of overlap, in which one grid cell actually contains the centre points of two different objects, we can use something called anchor boxes to allow one grid cell to detect multiple objects.

Is anchor box and bounding box same?

Anchor boxes are a set of predefined bounding boxes of a certain height and width. These boxes are defined to capture the scale and aspect ratio of specific object classes you want to detect and are typically chosen based on object sizes in your training datasets.


2 Answers

Anchor Boxes: predefined landmark rectangles for bounding boxes to pick and use offsets to give location for a detected object

Bounding Box: predicted rectangle for a detected object relative to an anchor box

Basically the idea is comparable to landmarks used in object detection models like in Snapchat's camera. A set of nodes are pre-decided for the network on specific regions of the image based on how selfie portraits are characterised, the network learns how to offset the nodes relative to different faces fed into the network before a filter or mask is applied for some visual m*sturbation to really excite the user

like image 143
LiNKeR Avatar answered Oct 18 '22 10:10

LiNKeR


Bounding Boxes Bounding boxes are boxes that are predicted by the network. These predicted boxes are overwritten on the input image so that you can visually understand what the position ans shape of rectangle are detected by the prediction. That is, they are rectangles you can see in this youtube video.

Anchor Boxes We can put some assumption on the shapes of bounding boxes. For example, if we want to detect humans, we should search humans with some vertical rectangular boxes. They are anchor boxes. The anchor boxes are fed to the network, before training and prediction, as a list of some numbers, which is a series of pairs of width and height:

anchors = [1.08, 1.19, 3.42, 4.41, 6.63, 11.38, 9.42, 5.11, 16.62, 10.52]

This list above defines 5 anchor boxes. We can feed arbitrary number of anchor boxes to the network.

These values are determined from the training data with some statistical procedure.

like image 42
spl Avatar answered Oct 18 '22 10:10

spl