I am building a custom vision application with Microsoft's CustomVision.ai.
I am using this tutorial.
When you tag images in object detection projects, you need to specify the region of each tagged object using normalized coordinates.
I have an XML file containing the annotations about the image, e.g. named sample_1.jpg:
<annotation>
<filename>sample_1.jpg</filename>
<size>
<width>410</width>
<height>400</height>
<depth>3</depth>
</size>
<object>
<bndbox>
<xmin>159</xmin>
<ymin>15</ymin>
<xmax>396</xmax>
<ymax>302</ymax>
</bndbox>
</object>
</annotation>
I have to convert the bounding box coordinates from xmin,xmax,ymin,ymax to x,y,w,h coordinates normalized according to the provided tutorial.
Can anyone provide me a conversion function?
Assuming x/ymin and x/ymax are your bounding corners, top left and bottom right respectively. Then:
x = xmin
y = ymin
w = xmax - xmin
h = ymax - ymin
You then need to normalize these, which means give them as a proportion of the whole image, so simple divide each value by its respective size from the values above:
x = xmin / width
y = ymin / height
w = (xmax - xmin) / width
h = (ymax - ymin) / height
This assumes a top-left origin, you will have to apply a shift factor if this is not the case.
There is a more straight-forward way to do those stuff with pybboxes. Install with,
pip install pybboxes
In your case,
import pybboxes as pbx
voc_bbox = (159, 15, 396, 302)
W, H = 410, 400 # WxH of the image
pbx.convert_bbox(voc_bbox, from_type="voc", to_type="coco")
>>> (159, 15, 237, 287)
Note that, converting to YOLO format requires the image width and height for scaling.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With