I'm trying to do some Schematron validation with lxml. For the specific application I'm working at, it's important that any tests that failed the validation are reported back. The lxml documentation mentions the presence of the validation_report
property object. I think this should contain the info I'm looking for, but I just can't figure out how work with it. Here's some example code that demonstrates my problem (adapted from http://lxml.de/validation.html#id2; tested with Python 2.7.4):
import StringIO
from lxml import isoschematron
from lxml import etree
def main():
# Schema
f = StringIO.StringIO('''\
<schema xmlns="http://purl.oclc.org/dsdl/schematron" >
<pattern id="sum_equals_100_percent">
<title>Sum equals 100%.</title>
<rule context="Total">
<assert test="sum(//Percent)=100">Sum is not 100%.</assert>
</rule>
</pattern>
</schema>
''')
# Parse schema
sct_doc = etree.parse(f)
schematron = isoschematron.Schematron(sct_doc, store_report = True)
# XML to validate - validation will fail because sum of numbers
# not equal to 100
notValid = StringIO.StringIO('''\
<Total>
<Percent>30</Percent>
<Percent>30</Percent>
<Percent>50</Percent>
</Total>
''')
# Parse xml
doc = etree.parse(notValid)
# Validate against schema
validationResult = schematron.validate(doc)
# Validation report (assuming here this is where reason
# for validation failure is stored, but perhaps I'm wrong?)
report = isoschematron.Schematron.validation_report
print("is valid: " + str(validationResult))
print(dir(report.__doc__))
main()
Now, from the value of validationResult
I can see that the validation failed (as expected), so next I would like to know why. The result of the second print statement gives me:
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__
format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__get
slice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mo
d__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook
__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center',
'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index
', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper',
'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', '
rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', '
strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
Which is about as far as I'm getting, based on the documentation and this related question. Could well be something really obvious I'm overlooking?
Again, no validation is performed unless explicitly requested. XML schema is supported in a similar way, but requires an explicit schema to be provided: If the validation fails (be it for a DTD or an XML schema), the parser will raise an exception:
All three provide identical APIs in lxml, represented by validator classes with the obvious names. lxml also provides support for ISO- Schematron, based on the pure-XSLT skeleton implementation of Schematron: There is also basic support for pre-ISO-Schematron through the libxml2 Schematron features.
There are several widely used XML schemas written in Schematron in addition to the RSS Schematron example, for example, the schema in Dan Connolly's Web Content Accessibility Checking Service .
If the validation fails (be it for a DTD or an XML schema), the parser will raise an exception: >>> root = etree.fromstring("<a>no int</a>", parser) # doctest: +ELLIPSIS Traceback (most recent call last): lxml.etree.XMLSyntaxError: Element 'a': 'no int' is not a valid value of the atomic type 'xs:integer'...
OK, so someone on Twitter gave me a suggestion which made me realise that I mistakenly got the reference to the schematron class all wrong. Since there don't seem to be any clear examples, I'll share my working solution below:
import StringIO
from lxml import isoschematron
from lxml import etree
def main():
# Example adapted from http://lxml.de/validation.html#id2
# Schema
f = StringIO.StringIO('''\
<schema xmlns="http://purl.oclc.org/dsdl/schematron" >
<pattern id="sum_equals_100_percent">
<title>Sum equals 100%.</title>
<rule context="Total">
<assert test="sum(//Percent)=100">Sum is not 100%.</assert>
</rule>
</pattern>
</schema>
''')
# Parse schema
sct_doc = etree.parse(f)
schematron = isoschematron.Schematron(sct_doc, store_report = True)
# XML to validate - validation will fail because sum of numbers
# not equal to 100
notValid = StringIO.StringIO('''\
<Total>
<Percent>30</Percent>
<Percent>30</Percent>
<Percent>50</Percent>
</Total>
''')
# Parse xml
doc = etree.parse(notValid)
# Validate against schema
validationResult = schematron.validate(doc)
# Validation report
report = schematron.validation_report
print("is valid: " + str(validationResult))
print(type(report))
print(report)
main()
The print statement on the report now results in the following output:
<?xml version="1.0" standalone="yes"?>
<svrl:schematron-output xmlns:svrl="http://purl.oclc.org/dsdl/svrl" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:schold="http://www.ascc.net/xml/schematron" xmlns:sch="http://www.ascc.net/xml/schematron" xmlns:iso="http://purl.oclc.org/dsdl/schematron" title="" schemaVersion="">
<!--
-->
<svrl:active-pattern id="sum_equals_100_percent" name="Sum equals 100%."/>
<svrl:fired-rule context="Total"/>
<svrl:failed-assert test="sum(//Percent)=100" location="/Total">
<svrl:text>Sum is not 100%.</svrl:text>
</svrl:failed-assert>
</svrl:schematron-output>
Which is exactly what I was looking for!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With