How do I validate XML document via compact RELAX NG schema in Python?
How about using lxml?
From the docs:
>>> f = StringIO('''\
... <element name="a" xmlns="http://relaxng.org/ns/structure/1.0">
... <zeroOrMore>
... <element name="b">
... <text />
... </element>
... </zeroOrMore>
... </element>
... ''')
>>> relaxng_doc = etree.parse(f)
>>> relaxng = etree.RelaxNG(relaxng_doc)
>>> valid = StringIO('<a><b></b></a>')
>>> doc = etree.parse(valid)
>>> relaxng.validate(doc)
True
>>> invalid = StringIO('<a><c></c></a>')
>>> doc2 = etree.parse(invalid)
>>> relaxng.validate(doc2)
False
If you want to check syntax vs Compact RelaxNG Syntax from command line, you can use pyjing
, from the jingtrang module.
It supports .rnc files and displays more details than just True
or False
. For example:
C:\>pyjing -c root.rnc invalid.xml
C:\invalid.xml:9:9: error: element "name" not allowed here; expected the element end-tag or element "bounds"
NOTE: it is a Python wrapper of the Java jingtrang
so it requires to have Java installed.
If you want to check the syntax from within Python, you can
Use pytrang
(from jingtrang wrapper) to convert "Compact RelaxNG" (.rnc) to XML RelaxNG (.rng):
pytrang root.rnc root.rng
Use lxml
to parse converted .rng file like this: https://lxml.de/validation.html#relaxng
That would be something like that:
>>> from lxml import etree
>>> from subprocess import call
>>> call("pytrang root.rnc root.rng")
>>> with open("root.rng") as f:
... relaxng_doc = etree.parse(f)
>>> relaxng = etree.RelaxNG(relaxng_doc)
>>> valid = StringIO('<a><b></b></a>')
>>> doc = etree.parse(valid)
>>> relaxng.validate(doc)
True
>>> invalid = StringIO('<a><c></c></a>')
>>> doc2 = etree.parse(invalid)
>>> relaxng.validate(doc2)
False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With