Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading PASCAL VOC annotations in python

I have annotations in xml files such as this one, which follows the PASCAL VOC convention:

<annotation>
<folder>training</folder>
<filename>chanel1.jpg</filename>
<source>
<database>synthetic initialization</database>
<annotation>PASCAL VOC2007</annotation>
<image>synthetic</image>
<flickrid>none</flickrid>
</source>
<owner>
<flickrid>none</flickrid>
<name>none</name>
</owner>
<size>
<width>640</width>
<height>427</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>chanel</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>344</xmin>
<ymin>10</ymin>
<xmax>422</xmax>
<ymax>83</ymax>
</bndbox>
</object>
<object>
<name>chanel</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>355</xmin>
<ymin>165</ymin>
<xmax>443</xmax>
<ymax>206</ymax>
</bndbox>
</object>
</annotation>

What is the cleanest way of retrieving for example the fields filename and bndbox in Python?

I was trying to ElementTree, which seems to be the official Python solution, but I can't make it work.

My code so far:

from xml.etree import ElementTree as ET
tree = ET.parse("data/all/annotations/" + file)
fn = tree.find('filename').text
boxes = tree.findall('bndbox')

this produces

fn == 'chanel1.jpg'
boxes == []

So it succesfully extracts the filename field, but not the bndbox'es.

like image 243
Jsevillamol Avatar asked Nov 15 '18 10:11

Jsevillamol


People also ask

What is VOC format?

Pascal Visual Object Classes(VOC) Pascal VOC is an XML file, unlike COCO which has a JSON file. In Pascal VOC we create a file for each of the image in the dataset. In COCO we have one file each, for entire dataset for training, testing and validation. The bounding Box in Pascal VOC and COCO data formats are different.

What is a Pascal VOC XML annotation?

The following code snippet is an example of a PASCAL VOC XML annotation: Based on its specifications, the annotations are to be defined in human-readable XML format with the same name as the image (except for extension) It should have the following items: folder — the parent directory of the image.

What is Pascal-VOC-writer?

GitHub - AndrewCarterUK/pascal-voc-writer: A python library for generating annotations in the PASCAL VOC format. This library can be used to create image annotation XML files in the PASCAL VOC file format.

What is the PASCAL VOC project?

The PASCAL Visual Object Classes (VOC) project is one of the earliest computer vision project that aims to standardize the datasets and annotations format. The annotations can be used for image classification and object detection tasks.

What are the annotations used for?

The annotations can be used for image classification and object detection tasks. The following code snippet is an example of a PASCAL VOC XML annotation: Based on its specifications, the annotations are to be defined in human-readable XML format with the same name as the image (except for extension) It should have the following items:


1 Answers

That's a quite easy solution for your problem:

This will return your box coordinates in a nested list [xmin, ymin, xmax, ymax] and the filename Once I struggled with bndbox tags which where mixed up (ymin, xmin,...) or any other strange combinations, so this code read the tags not only the position.

Finally I updated the code. Thanks to craq and Pritesh Gohil, you were absolutely right.

Hope it helps...

import xml.etree.ElementTree as ET


def read_content(xml_file: str):

    tree = ET.parse(xml_file)
    root = tree.getroot()

    list_with_all_boxes = []

    for boxes in root.iter('object'):

        filename = root.find('filename').text

        ymin, xmin, ymax, xmax = None, None, None, None

        ymin = int(boxes.find("bndbox/ymin").text)
        xmin = int(boxes.find("bndbox/xmin").text)
        ymax = int(boxes.find("bndbox/ymax").text)
        xmax = int(boxes.find("bndbox/xmax").text)

        list_with_single_boxes = [xmin, ymin, xmax, ymax]
        list_with_all_boxes.append(list_with_single_boxes)

    return filename, list_with_all_boxes

name, boxes = read_content("file.xml")
like image 63
pix_1 Avatar answered Sep 19 '22 12:09

pix_1