Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Schematron validation with lxml in Python: how to retrieve validation errors?

I'm trying to do some Schematron validation with lxml. For the specific application I'm working at, it's important that any tests that failed the validation are reported back. The lxml documentation mentions the presence of the validation_report property object. I think this should contain the info I'm looking for, but I just can't figure out how work with it. Here's some example code that demonstrates my problem (adapted from http://lxml.de/validation.html#id2; tested with Python 2.7.4):

import StringIO
from lxml import isoschematron
from lxml import etree

def main():

    # Schema
    f = StringIO.StringIO('''\
    <schema xmlns="http://purl.oclc.org/dsdl/schematron" >
    <pattern id="sum_equals_100_percent">
    <title>Sum equals 100%.</title>
    <rule context="Total">
    <assert test="sum(//Percent)=100">Sum is not 100%.</assert>
    </rule>
    </pattern>
    </schema>
    ''')

    # Parse schema
    sct_doc = etree.parse(f)
    schematron = isoschematron.Schematron(sct_doc, store_report = True)

    # XML to validate - validation will fail because sum of numbers
    # not equal to 100 
    notValid = StringIO.StringIO('''\
        <Total>
        <Percent>30</Percent>
        <Percent>30</Percent>
        <Percent>50</Percent>
        </Total>
        ''')
    # Parse xml
    doc = etree.parse(notValid)

    # Validate against schema
    validationResult = schematron.validate(doc)

    # Validation report (assuming here this is where reason 
    # for validation failure is stored, but perhaps I'm wrong?)
    report = isoschematron.Schematron.validation_report

    print("is valid: " + str(validationResult))
    print(dir(report.__doc__))

main()

Now, from the value of validationResult I can see that the validation failed (as expected), so next I would like to know why. The result of the second print statement gives me:

['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__
format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__get
slice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mo
d__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
 '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook
__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center',
 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index
', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper',
'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', '
rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', '
strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

Which is about as far as I'm getting, based on the documentation and this related question. Could well be something really obvious I'm overlooking?

like image 709
johan Avatar asked Nov 26 '14 13:11

johan


People also ask

Do I need to validate my XML Schema?

Again, no validation is performed unless explicitly requested. XML schema is supported in a similar way, but requires an explicit schema to be provided: If the validation fails (be it for a DTD or an XML schema), the parser will raise an exception:

Does lxml support pre-ISO-Schematron?

All three provide identical APIs in lxml, represented by validator classes with the obvious names. lxml also provides support for ISO- Schematron, based on the pure-XSLT skeleton implementation of Schematron: There is also basic support for pre-ISO-Schematron through the libxml2 Schematron features.

Are there any widely used XML schemas written in Schematron?

There are several widely used XML schemas written in Schematron in addition to the RSS Schematron example, for example, the schema in Dan Connolly's Web Content Accessibility Checking Service .

What happens when XML validation fails?

If the validation fails (be it for a DTD or an XML schema), the parser will raise an exception: >>> root = etree.fromstring("<a>no int</a>", parser) # doctest: +ELLIPSIS Traceback (most recent call last): lxml.etree.XMLSyntaxError: Element 'a': 'no int' is not a valid value of the atomic type 'xs:integer'...


1 Answers

OK, so someone on Twitter gave me a suggestion which made me realise that I mistakenly got the reference to the schematron class all wrong. Since there don't seem to be any clear examples, I'll share my working solution below:

import StringIO
from lxml import isoschematron
from lxml import etree

def main():
    # Example adapted from http://lxml.de/validation.html#id2

    # Schema
    f = StringIO.StringIO('''\
    <schema xmlns="http://purl.oclc.org/dsdl/schematron" >
    <pattern id="sum_equals_100_percent">
    <title>Sum equals 100%.</title>
    <rule context="Total">
    <assert test="sum(//Percent)=100">Sum is not 100%.</assert>
    </rule>
    </pattern>
    </schema>
    ''')

    # Parse schema
    sct_doc = etree.parse(f)
    schematron = isoschematron.Schematron(sct_doc, store_report = True)

    # XML to validate - validation will fail because sum of numbers 
    # not equal to 100 
    notValid = StringIO.StringIO('''\
        <Total>
        <Percent>30</Percent>
        <Percent>30</Percent>
        <Percent>50</Percent>
        </Total>
        ''')
    # Parse xml
    doc = etree.parse(notValid)

    # Validate against schema
    validationResult = schematron.validate(doc)

    # Validation report 
    report = schematron.validation_report

    print("is valid: " + str(validationResult))
    print(type(report))
    print(report)

main()

The print statement on the report now results in the following output:

 <?xml version="1.0" standalone="yes"?>
<svrl:schematron-output xmlns:svrl="http://purl.oclc.org/dsdl/svrl" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:schold="http://www.ascc.net/xml/schematron" xmlns:sch="http://www.ascc.net/xml/schematron" xmlns:iso="http://purl.oclc.org/dsdl/schematron" title="" schemaVersion="">
  <!--   
           
           
         -->
  <svrl:active-pattern id="sum_equals_100_percent" name="Sum equals 100%."/>
  <svrl:fired-rule context="Total"/>
  <svrl:failed-assert test="sum(//Percent)=100" location="/Total">
    <svrl:text>Sum is not 100%.</svrl:text>
  </svrl:failed-assert>
</svrl:schematron-output>

Which is exactly what I was looking for!

like image 87
johan Avatar answered Sep 25 '22 02:09

johan