XMLSchema: Is it possible to calculate how valid an invalid document is (eg. as a percentage)?

Question

I'm using lxml in Python to validate a number of XML documents against an XML Schema definition. A good number of these documents do not validate -- and at the moment they're not expected to -- but it would be useful if I could calculate how valid they are, as a percentage, for reporting purposes. I have the ability to use xmllint or other command line tools, should those be able to provide a useful statistic.

Sean Vieira · Accepted Answer

lxml parsers provide a way to get a list of the errors that occurred while trying to parse the document. Combine this with the parser's recover keyword argument and you get something like this:

# Warning, untested, may not work
parser = etree.XMLParser(recover=True)
it_would_be_a_tree = etree.parse(your_xml_data, parser)
total_errors = len(parser.error_log)

Then you can calculate the percentage of the file that total_errors represents. You could use a naive measure, like errors per line or errors per character without any trouble. More sophisticated measures are also possible if it_would_be_a_tree is actually a tree structure (total_elements / total_errors, for example).

XMLSchema: Is it possible to calculate how valid an invalid document is (eg. as a percentage)?

Tags:

xml

lxml

xsd

xmllint

Phillip B Oldham

1 Answers

Sean Vieira

Recent Activity

Donate For Us

XMLSchema: Is it possible to calculate how valid an invalid document is (eg. as a percentage)?

Tags:

xml

lxml

xsd

xmllint

Phillip B Oldham

1 Answers

Sean Vieira

Related questions

Recent Activity

Donate For Us