Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can lxml be used to check if xml file is well formed, or is it too powerful?

Can lxml be used to check if xml is well formed or is it too powerful? For example it seems to be able to parse even if xml is not well formed. What's the easiest way to check if an xml file is well formed?

like image 970
Celeritas Avatar asked Feb 10 '23 12:02

Celeritas


1 Answers

lxml should've thrown exception when parsing non well-formed XML, for example :

from lxml import etree

xml = """
<multipleroot>
    <noclosingtag>
</multipleroot>
<multipleroot></multipleroot>"""
doc = etree.fromstring(xml)

exception thrown:

Traceback (most recent call last):
  File "D:\StackOverflow\Python\Q50.py", line 8, in <module>
    doc = etree.fromstring(xml)
  ......
  ......
XMLSyntaxError: Opening and ending tag mismatch: noclosingtag line 3 and multipleroot, line 4, column 16

However if you explicitly tell XMLParser to recover non well-formed XML, or you're using HTMLParser instead, lxml may still able to parse the XML :

from lxml import etree

xml = """
<multipleroot>
    <noclosingtag>
</multipleroot>
<multipleroot></multipleroot>"""
parser = etree.XMLParser(recover=True)
#parser = etree.HTMLParser()
doc = etree.fromstring(xml, parser=parser)
print(etree.tostring(doc))

successfully print parsed XML :

<multipleroot>
    <noclosingtag>
</noclosingtag>
<multipleroot/></multipleroot>
like image 181
har07 Avatar answered Feb 12 '23 11:02

har07