Generating anchor boxes using K-means clustering , YOLO

Question

I am trying to understand working of YOLO and how it detects object in an image. My question is, what role does k-means clustering play in detecting bounding box around the object? Thanks.

Vinay Hegde · Accepted Answer

K -means clustering algorithm is very famous algorithm in data science. This algorithm aims to partition n observation to k clusters. Mainly it includes :

Initialization : K means (i.e centroid) are generated at random.
Assignment : Clustering formation by associating the each observation with nearest centroid.
Updating Cluster : Centroid of a newly created cluster becomes mean.

Assignment and Update are repitatively occurs untill convergence. The final result is that the sum of squared errors is minimized between points and their respective centroids.

EDIT :

Why use K means

K-means is computationally faster and more efficient compare to other unsupervised learning algorithms. Don't forget time complexity is linear.
It produces a higher cluster then the hierarchical clustering. More number of cluster helps to get more accurate end result.
Instance can change cluster (move to another cluster) when centroid are re-computed.
Works well even if some of your assumption are broken.

what it really does in determining anchor box

It will create a thouasands of anchor box (i.e Clusters in k-means) for each predictor that represent shape, location, size etc.
For each anchor box, calculate which object’s bounding box has the highest overlap divided by non-overlap. This is called Intersection Over Union or IOU.
If the highest IOU is greater than 50% ( This can be customized), tell the anchor box that it should detect the object it has highest IOU.
Otherwise if the IOU is greater than 40%, tell the neural network that the true detection is ambiguous and not to learn from that example.
If the highest IOU is less than 40%, then it should predict that there is no object.

Thanks!

shivaraj karki · Answer

In general, bounding boxes for objects are given by tuples of the form (x0,y0,x1,y1) where x0,y0 are the coordinates of the lower left corner and x1,y1 are the coordinates of the upper right corner.

Need to extract width and height from these coordinates, and normalize data with respect to image width and height.

Metric for K-mean

Euclidean distance
IoU (Jaccard index)

IoU turns out to better than former

Jaccard index = (Intersection between selected box and cluster head box)/(Union between selected box and cluster head box)

At initialization we can choose k random boxes as our cluster heads. Assign anchor boxes to respective clusters based on IoU value > threshold and calculate mean IoU of cluster.

This process can be repeated until convergence.

Generating anchor boxes using K-means clustering , YOLO

Tags:

k-means

computer-vision

bounding-box

object-detection

yolo

yin yang

2 Answers

Vinay Hegde

shivaraj karki

Recent Activity

Donate For Us

Generating anchor boxes using K-means clustering , YOLO

Tags:

k-means

computer-vision

bounding-box

object-detection

yolo

yin yang

2 Answers

Vinay Hegde

shivaraj karki

Related questions

Recent Activity

Donate For Us