Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatic XSD validation

According to the lxml documentation "The DTD is retrieved automatically based on the DOCTYPE of the parsed document. All you have to do is use a parser that has DTD validation enabled."

http://lxml.de/validation.html#validation-at-parse-time

However, if you want to validate against an XML schema, you need to explicitly reference one.

I am wondering why this is and would like to know if there is a library or function that can do this. Or even an explanation of how to make this happen myself. The problem is there seems to be many ways to reference an XSD and I need to support all of them.

Validation is not the issue. The issue is how to determine the schemas to validate against. Ideally this would handle inline schemas as well.

Update:

Here is an example.

simpletest.xsd:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="name" type="xs:string"/>
</xs:schema>

simpletest.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<name xmlns="http://www.example.org"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.example.org simpletest.xsd">foo</name>

I would like to do something like the following:

>>> parser = etree.XMLParser(xsd_validation=True)
>>> tree = etree.parse("simpletest.xml", parser)
like image 260
Jono Avatar asked Mar 23 '12 17:03

Jono


People also ask

What is XSD validator?

The WSRR web user interface validates each definition file when starting, and when new or updated definition files are loaded. This is done according to the definition XML Schema Definition (XSD).

Can we validate XML schema?

XML documents are validated by the Create method of the XmlReader class. To validate an XML document, construct an XmlReaderSettings object that contains an XML schema definition language (XSD) schema with which to validate the XML document.

What is the difference between XSD and DTD?

1. DTD are the declarations that define a document type for SGML. XSD describes the elements in a XML document.


1 Answers

I have a project that has over 100 different schemas and xml trees. In order to manage all of them and validate them i did a few things.

1) I created a file (i.e. xmlTrees.py) where i created a dictionary of every xml and corresponding schema associated with it, and the xml path. This allowed me to have a single place to get both xml & the schema used to validate that xml.

MY_XML = {'url':'/pathToTree/myTree.xml', 'schema':'myXSD.xsd'}

2) In the project we have equally as many namespaces (very hard to manage). So what i did was again i created a single file that contained all the namespaces in the format lxml likes. Then in my tests and scripts i would just always pass the superset of namespaces.

ALL_NAMESPACES = {
    'namespace1':  'http://www.example.org',
    'namespace2':  'http://www.example2.org'
}

3) For basic/generic validation i ended up creating a basic function i could call:

    def validateXML(content, schemaContent):

    try:
        xmlSchema_doc = etree.parse(schemaContent);
        xmlSchema = etree.XMLSchema(xmlSchema_doc);
        xml = etree.parse(StringIO(content));
    except:
        logging.critical("Could not parse schema or content to validate xml");
        response['valid'] = False;
        response['errorlog'] = "Could not parse schema or content to validate xml";

    response = {}
    # Validate the content against the schema.
    try:
        xmlSchema.assertValid(xml)
        response['valid'] = True
        response['errorlog'] = None
    except etree.DocumentInvalid, info:
        response['valid'] = False
        response['errorlog'] = xmlSchema.error_log

    return response

basically any function that wants to use this needs to send the xml content and the xsd content as strings. This provided me with the most flexability. I then just placed this function in a file where i had all my xml helper functions.

like image 61
Jtello Avatar answered Oct 06 '22 13:10

Jtello