Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python XML: ParseError: junk after document element

Tags:

Trying to parse XML file into ElementTree:

>>> import xml.etree.cElementTree as ET
>>> tree = ET.ElementTree(file='D:\Temp\Slikvideo\JPEG\SV_4_1_mask\index.xml')

I get following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Anaconda2\lib\xml\etree\ElementTree.py", line 611, in __init__
    self.parse(file)
  File "<string>", line 38, in parse
ParseError: junk after document element: line 3, column 0

XML file starts like this:

<?xml version="1.0" encoding="UTF-8" ?>
<Version Writer="E:\d\src\Modules\SceneSerialization\src\mitkSceneIO.cpp" Revision="$Revision: 17055 $" FileVersion="1" />
<node UID="OBJECT_2016080819041580480127">
    <source UID="OBJECT_2016080819041550469454" />
    <data type="LabelSetImage" file="hfbaaa_Bolus.nrrd" />
    <properties file="sicaaa" />
</node>
<node UID="OBJECT_2016080819041512769572">
    <source UID="OBJECT_2016080819041598947781" />
    <data type="LabelSetImage" file="ifbaaa_Bolus.nrrd" />
    <properties file="ticaaa" />
</node>

followed by many more nodes.

I do not see any junk in line 3, column 0? I assume there must be another reason for the error.

The .xml file is generated by external software MITK so I assume that should be ok.

Working on Win 7, 64 bit, VS2015, Anaconda

like image 718
jdelange Avatar asked Aug 09 '16 14:08

jdelange


1 Answers

As @Matthias Wiehl said, ElementTree expects only a single root node and is not well-formed XML, which should be fixed at its origin. As a workaround you can add a fake root node to the document.

import xml.etree.cElementTree as ET
import re

with open("index.xml") as f:
    xml = f.read()
tree = ET.fromstring(re.sub(r"(<\?xml[^>]+\?>)", r"\1<root>", xml) + "</root>")
like image 82
Martin Valgur Avatar answered Sep 17 '22 10:09

Martin Valgur