I have a XML file and I have a XML schema. I want to validate the file against that schema and check if it adheres to that. I am using python but am open to any language for that matter if there is no such useful library in python.
What would be my best options here? I would worry about the how fast I can get this up and running.
You can validate your XML documents against XML schemas only; validation against DTDs is not supported. However, although you cannot validate against DTDs, you can insert documents that contain a DOCTYPE or that refer to DTDs.
XML documents are validated by the Create method of the XmlReader class. To validate an XML document, construct an XmlReaderSettings object that contains an XML schema definition language (XSD) schema with which to validate the XML document. The System. Xml.
XML Validator - XSD (XML Schema) XSD files are "XML Schemas" that describe the structure of a XML document. The validator checks for well formedness first, meaning that your XML file must be parsable using a DOM/SAX parser, and only then does it validate your XML against the XML Schema.
Reference the XSD schema in the XML document using XML schema instance attributes such as either xsi:schemaLocation or xsi:noNamespaceSchemaLocation. Add the XSD schema file to a schema cache and then connect that cache to the DOM document or SAX reader, prior to loading or parsing the XML document.
Definitely lxml
.
Define an XMLParser
with a predefined schema, load the the file fromstring()
and catch any XML Schema errors:
from lxml import etree
def validate(xmlparser, xmlfilename):
try:
with open(xmlfilename, 'r') as f:
etree.fromstring(f.read(), xmlparser)
return True
except etree.XMLSchemaError:
return False
schema_file = 'schema.xsd'
with open(schema_file, 'r') as f:
schema_root = etree.XML(f.read())
schema = etree.XMLSchema(schema_root)
xmlparser = etree.XMLParser(schema=schema)
filenames = ['input1.xml', 'input2.xml', 'input3.xml']
for filename in filenames:
if validate(xmlparser, filename):
print("%s validates" % filename)
else:
print("%s doesn't validate" % filename)
If the schema file contains an xml tag with an encoding (e.g. <?xml version="1.0" encoding="UTF-8"?>
), the code above will generate the following error:
Traceback (most recent call last):
File "<input>", line 2, in <module>
schema_root = etree.XML(f.read())
File "src/lxml/etree.pyx", line 3192, in lxml.etree.XML
File "src/lxml/parser.pxi", line 1872, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
A solution is to open the files in byte mode: open(..., 'rb')
[...]
def validate(xmlparser, xmlfilename):
try:
with open(xmlfilename, 'rb') as f:
[...]
with open(schema_file, 'rb') as f:
[...]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With