Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML (.xsd) feed validation against a schema

I have a XML file and I have a XML schema. I want to validate the file against that schema and check if it adheres to that. I am using python but am open to any language for that matter if there is no such useful library in python.

What would be my best options here? I would worry about the how fast I can get this up and running.

like image 526
Scooby Avatar asked Jul 23 '13 19:07

Scooby


People also ask

Can we validate XML documents against a schema?

You can validate your XML documents against XML schemas only; validation against DTDs is not supported. However, although you cannot validate against DTDs, you can insert documents that contain a DOCTYPE or that refer to DTDs.

How you will validate XML document using schema?

XML documents are validated by the Create method of the XmlReader class. To validate an XML document, construct an XmlReaderSettings object that contains an XML schema definition language (XSD) schema with which to validate the XML document. The System. Xml.

What is XML validation against XSD?

XML Validator - XSD (XML Schema) XSD files are "XML Schemas" that describe the structure of a XML document. The validator checks for well formedness first, meaning that your XML file must be parsable using a DOM/SAX parser, and only then does it validate your XML against the XML Schema.

How do I link XML to schema?

Reference the XSD schema in the XML document using XML schema instance attributes such as either xsi:schemaLocation or xsi:noNamespaceSchemaLocation. Add the XSD schema file to a schema cache and then connect that cache to the DOM document or SAX reader, prior to loading or parsing the XML document.


1 Answers

Definitely lxml.

Define an XMLParser with a predefined schema, load the the file fromstring() and catch any XML Schema errors:

from lxml import etree

def validate(xmlparser, xmlfilename):
    try:
        with open(xmlfilename, 'r') as f:
            etree.fromstring(f.read(), xmlparser) 
        return True
    except etree.XMLSchemaError:
        return False

schema_file = 'schema.xsd'
with open(schema_file, 'r') as f:
    schema_root = etree.XML(f.read())

schema = etree.XMLSchema(schema_root)
xmlparser = etree.XMLParser(schema=schema)

filenames = ['input1.xml', 'input2.xml', 'input3.xml']
for filename in filenames:
    if validate(xmlparser, filename):
        print("%s validates" % filename)
    else:
        print("%s doesn't validate" % filename)

Note about encoding

If the schema file contains an xml tag with an encoding (e.g. <?xml version="1.0" encoding="UTF-8"?>), the code above will generate the following error:

Traceback (most recent call last):
  File "<input>", line 2, in <module>
    schema_root = etree.XML(f.read())
  File "src/lxml/etree.pyx", line 3192, in lxml.etree.XML
  File "src/lxml/parser.pxi", line 1872, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

A solution is to open the files in byte mode: open(..., 'rb')

[...]
def validate(xmlparser, xmlfilename):
    try:
        with open(xmlfilename, 'rb') as f:
[...]
with open(schema_file, 'rb') as f:
[...]
like image 96
alecxe Avatar answered Sep 21 '22 18:09

alecxe