I'm trying to do some Schematron validation with lxml. For the specific application I'm working at, it's important that any tests that failed the validation are reported back. The lxml documentation mentions the presence of the <code>validation_report</code> property object. I think this should contain the info I'm looking for, but I just can't figure out how work with it. Here's some example code that demonstrates my problem (adapted from http://lxml.de/validation.html#id2; tested with Python 2.7.4): <pre class="prettyprint lang-py prettyprint-override"><code>import StringIO from lxml import isoschematron from lxml import etree def main(): # Schema f = StringIO.StringIO('''\ <schema xmlns="http://purl.oclc.org/dsdl/schematron" > <pattern id="sum_equals_100_percent"> <title>Sum equals 100%.</title> <rule context="Total"> <assert test="sum(//Percent)=100">Sum is not 100%.</assert> </rule> </pattern> </schema> ''') # Parse schema sct_doc = etree.parse(f) schematron = isoschematron.Schematron(sct_doc, store_report = True) # XML to validate - validation will fail because sum of numbers # not equal to 100 notValid = StringIO.StringIO('''\ <Total> <Percent>30</Percent> <Percent>30</Percent> <Percent>50</Percent> </Total> ''') # Parse xml doc = etree.parse(notValid) # Validate against schema validationResult = schematron.validate(doc) # Validation report (assuming here this is where reason # for validation failure is stored, but perhaps I'm wrong?) report = isoschematron.Schematron.validation_report print("is valid: " + str(validationResult)) print(dir(report.__doc__)) main() </code></pre> Now, from the value of <code>validationResult</code> I can see that the validation failed (as expected), so next I would like to know why. The result of the second print statement gives me: <pre class="prettyprint"><code>['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__ format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__get slice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mo d__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook __', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index ', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', ' rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', ' strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'] </code></pre> Which is about as far as I'm getting, based on the documentation and this related question. Could well be something really obvious I'm overlooking?

OK, so someone on Twitter gave me a suggestion which made me realise that I mistakenly got the reference to the schematron class all wrong. Since there don't seem to be any clear examples, I'll share my working solution below: <pre class="prettyprint lang-py prettyprint-override"><code>import StringIO from lxml import isoschematron from lxml import etree def main(): # Example adapted from http://lxml.de/validation.html#id2 # Schema f = StringIO.StringIO('''\ <schema xmlns="http://purl.oclc.org/dsdl/schematron" > <pattern id="sum_equals_100_percent"> <title>Sum equals 100%.</title> <rule context="Total"> <assert test="sum(//Percent)=100">Sum is not 100%.</assert> </rule> </pattern> </schema> ''') # Parse schema sct_doc = etree.parse(f) schematron = isoschematron.Schematron(sct_doc, store_report = True) # XML to validate - validation will fail because sum of numbers # not equal to 100 notValid = StringIO.StringIO('''\ <Total> <Percent>30</Percent> <Percent>30</Percent> <Percent>50</Percent> </Total> ''') # Parse xml doc = etree.parse(notValid) # Validate against schema validationResult = schematron.validate(doc) # Validation report report = schematron.validation_report print("is valid: " + str(validationResult)) print(type(report)) print(report) main() </code></pre> The print statement on the report now results in the following output: <pre class="prettyprint"><code> <?xml version="1.0" standalone="yes"?> <svrl:schematron-output xmlns:svrl="http://purl.oclc.org/dsdl/svrl" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:schold="http://www.ascc.net/xml/schematron" xmlns:sch="http://www.ascc.net/xml/schematron" xmlns:iso="http://purl.oclc.org/dsdl/schematron" title="" schemaVersion="">  <svrl:active-pattern id="sum_equals_100_percent" name="Sum equals 100%."/> <svrl:fired-rule context="Total"/> <svrl:failed-assert test="sum(//Percent)=100" location="/Total"> <svrl:text>Sum is not 100%.</svrl:text> </svrl:failed-assert> </svrl:schematron-output> </code></pre> Which is exactly what I was looking for!

Schematron validation with lxml in Python: how to retrieve validation errors?

Tags:

python

validation

lxml

schematron

I'm trying to do some Schematron validation with lxml. For the specific application I'm working at, it's important that any tests that failed the validation are reported back. The lxml documentation mentions the presence of the validation_report property object. I think this should contain the info I'm looking for, but I just can't figure out how work with it. Here's some example code that demonstrates my problem (adapted from http://lxml.de/validation.html#id2; tested with Python 2.7.4):

import StringIO
from lxml import isoschematron
from lxml import etree

def main():

    # Schema
    f = StringIO.StringIO('''\
    <schema xmlns="http://purl.oclc.org/dsdl/schematron" >
    <pattern id="sum_equals_100_percent">
    <title>Sum equals 100%.</title>
    <rule context="Total">
    <assert test="sum(//Percent)=100">Sum is not 100%.</assert>
    </rule>
    </pattern>
    </schema>
    ''')

    # Parse schema
    sct_doc = etree.parse(f)
    schematron = isoschematron.Schematron(sct_doc, store_report = True)

    # XML to validate - validation will fail because sum of numbers
    # not equal to 100 
    notValid = StringIO.StringIO('''\
        <Total>
        <Percent>30</Percent>
        <Percent>30</Percent>
        <Percent>50</Percent>
        </Total>
        ''')
    # Parse xml
    doc = etree.parse(notValid)

    # Validate against schema
    validationResult = schematron.validate(doc)

    # Validation report (assuming here this is where reason 
    # for validation failure is stored, but perhaps I'm wrong?)
    report = isoschematron.Schematron.validation_report

    print("is valid: " + str(validationResult))
    print(dir(report.__doc__))

main()

Now, from the value of validationResult I can see that the validation failed (as expected), so next I would like to know why. The result of the second print statement gives me:

['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__
format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__get
slice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mo
d__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
 '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook
__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center',
 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index
', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper',
'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', '
rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', '
strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

Which is about as far as I'm getting, based on the documentation and this related question. Could well be something really obvious I'm overlooking?

709

asked Nov 26 '14 13:11

johan

1 Answers

OK, so someone on Twitter gave me a suggestion which made me realise that I mistakenly got the reference to the schematron class all wrong. Since there don't seem to be any clear examples, I'll share my working solution below:

import StringIO
from lxml import isoschematron
from lxml import etree

def main():
    # Example adapted from http://lxml.de/validation.html#id2

    # Schema
    f = StringIO.StringIO('''\
    <schema xmlns="http://purl.oclc.org/dsdl/schematron" >
    <pattern id="sum_equals_100_percent">
    <title>Sum equals 100%.</title>
    <rule context="Total">
    <assert test="sum(//Percent)=100">Sum is not 100%.</assert>
    </rule>
    </pattern>
    </schema>
    ''')

    # Parse schema
    sct_doc = etree.parse(f)
    schematron = isoschematron.Schematron(sct_doc, store_report = True)

    # XML to validate - validation will fail because sum of numbers 
    # not equal to 100 
    notValid = StringIO.StringIO('''\
        <Total>
        <Percent>30</Percent>
        <Percent>30</Percent>
        <Percent>50</Percent>
        </Total>
        ''')
    # Parse xml
    doc = etree.parse(notValid)

    # Validate against schema
    validationResult = schematron.validate(doc)

    # Validation report 
    report = schematron.validation_report

    print("is valid: " + str(validationResult))
    print(type(report))
    print(report)

main()

The print statement on the report now results in the following output:

 <?xml version="1.0" standalone="yes"?>
<svrl:schematron-output xmlns:svrl="http://purl.oclc.org/dsdl/svrl" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:schold="http://www.ascc.net/xml/schematron" xmlns:sch="http://www.ascc.net/xml/schematron" xmlns:iso="http://purl.oclc.org/dsdl/schematron" title="" schemaVersion="">
  <!--   
           
           
         -->
  <svrl:active-pattern id="sum_equals_100_percent" name="Sum equals 100%."/>
  <svrl:fired-rule context="Total"/>
  <svrl:failed-assert test="sum(//Percent)=100" location="/Total">
    <svrl:text>Sum is not 100%.</svrl:text>
  </svrl:failed-assert>
</svrl:schematron-output>

Which is exactly what I was looking for!

answered Sep 25 '22 02:09

johan

Related questions
                            
                                add slugified title to url
                            
                                Issue with Python3's built-in zip function
                            
                                tcl_error in Tkinter when launching python IDLE from Cygwin
                            
                                Virtualenv fails on OS X Yosemite with OSError
                            
                                Saving django test database in a fixture?
                            
                                setup.icloud.com two-step verification
                            
                                Replace values in a dataframe column based on condition
                            
                                Make isinstance(obj, cls) work with a decorated class
                            
                                Python Pandas Pivot - Why Fails
                            
                                What are the benefits / drawbacks of a list of lists compared to a numpy array of OBJECTS with regards to MEMORY?
                            
                                Python Truncating to 32-bit
                            
                                Easy, scriptable way to sub-sample unstructured THREDDS data?
                            
                                Search and replace text odfpy
                            
                                ImportError: cannot import name 'certs'
                            
                                Pydev debugger: Unable to find module to reload
                            
                                Does Python optimize dictionary lookups under the hood?
                            
                                Python coverage.py exclude_lines
                            
                                HTTP METHOD categorization cancel vs. delete
                            
                                Python: Visualize a normal curve on data's histogram
                            
                                regex: getting backreference to number, adding to it

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Schematron validation with lxml in Python: how to retrieve validation errors?

Tags:

python

validation

lxml

schematron

johan

People also ask

1 Answers

johan

Recent Activity

Donate For Us