Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to validate xml using python without third-party libs?

Tags:

I have some xml pieces like this:

<!DOCTYPE mensaje SYSTEM "record.dtd">
<record>
    <player_birthday>1979-09-23</player_birthday>
    <player_name>Orene Ai'i</player_name>
    <player_team>Blues</player_team>
    <player_id>453</player_id>
    <player_height>170</player_height>
    <player_position>F&W</player_position>   <---- a '&' here.
    <player_weight>75</player_weight>
</record>

Is there any way to validate whether the xml pieces is well-formatted? Is there any way to validate the xml against a DTD or XML Scheme?

For various reasons I can't use any third-party packages.

e.g. the xml above is not conrrect since it has a '&' in it. Note that the DOCTYPE definition sentence refer to a DTD.

like image 349
WoooHaaaa Avatar asked Dec 06 '12 11:12

WoooHaaaa


People also ask

How do I check if an XML file is valid in Python?

You can easily validate an XML file or tree against an XML Schema (XSD) with the xmlschema Python package. It's pure Python, available on PyPi and doesn't have many dependencies. The method raises an exception if the file doesn't validate against the XSD.

How do you check XML file is valid or not online?

XML Validator Online XML Validator is easy to use the XML Validate tool. Copy, Paste, and Validate. This is also called as XML Lint tool. Validation of a document and its syntax is important to ensure that the XML implementation has correctly and accurately reflected the user's intentions.


1 Answers

Just try to parse it with ElementTree (xml.etree.ElementTree.fromstring) - it will raise an error if the XML is not well formed.

>>> a = """<record>
...     <player_birthday>1979-09-23</player_birthday>
...     <player_name>Orene Ai'i</player_name>
...     <player_team>Blues</player_team>
...     <player_id>453</player_id>
...     <player_height>170</player_height>
...     <player_position>F&W</player_position>   <---- a '&' here.
...     <player_weight>75</player_weight>
... </record>"""
>>> 
>>> from xml.etree import ElementTree as ET
>>> x = ET.fromstring(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1282, in XML
    parser.feed(text)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1624, in feed
    self._raiseerror(v)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1488, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 7, column 24
like image 119
jsbueno Avatar answered Sep 22 '22 05:09

jsbueno