How to interpret anchor boxes in Yolo or R-CNN?

Question

For algorithms like yolo or R-CNN, they use the concept of anchor boxes for predicting objects. https://pjreddie.com/darknet/yolo/

The anchor boxes are trained on specific dataset, one for COCO dataset is:

anchors =  0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828

However, i don't understand how to interpret these anchor boxes? What does a pair of values (0.57273, 0.677385) means?

spl · Accepted Answer

In the original YOLO or YOLOv1, the prediction was done without any assumption on the shape of the target objects. Let's say that the network tries to detect humans. We know that, generally, humans fit in a vertical rectangle box, rather than a square one. However, the original YOLO tried to detect humans with rectangle and square box with equal probability.

But this is not efficient and might decrease the prediction speed. So in YOLOv2, we put some assumption on the shapes of the objects. These are Anchor-Boxes. Usually we feed the anchor boxes to the network as a list of some numbers, which is a series of pairs of width and height:

anchors = [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828]

In the above example, (0.57273, 0.677385) represents a single anchor box, in which the two elements are width and height respectively. That is, this list defines 5 different anchor boxes. Note that these values are relative to the output size. For example, YOLOv2 outputs 13x13 feature mat and you can get the absolute values by multiplying 13 to the values of anchors.

Using anchor boxes made the prediction a little bit faster. But the accuracy might decrease. The paper of YOLOv2 says:

Using anchor boxes we get a small decrease in accuracy. YOLO only predicts 98 boxes per image but with anchor boxes our model predicts more than a thousand. Without anchor boxes our intermediate model gets 69.5 mAP with a recall of 81%. With anchor boxes our model gets 69.2 mAP with a recall of 88%. Even though the mAP decreases, the increase in recall means that our model has more room to improve

How to interpret anchor boxes in Yolo or R-CNN?

Tags:

deep-learning

computer-vision

Earthgod

1 Answers

spl

Recent Activity

Donate For Us

How to interpret anchor boxes in Yolo or R-CNN?

Tags:

deep-learning

computer-vision

Earthgod

1 Answers

spl

Related questions

Recent Activity

Donate For Us