Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML Schema Validation with RelaxNG

Which XML validation tools can you recommend for both performance and accuracy, each of which is a critical issue on our system? We have the following requirements:

  • It is not xmllint (see below)
  • Supports RelaxNG
  • Can easily integrate with Perl (this is optional, but it would be nice)

Why not xmllint? (This is background and you can skip it if you like)

We have a large Perl system which uses RelaxNG to validate our XML. We use the compact RelaxNG format and trang to convert it to the standard RelaxNG format. Then we do the actual validation via xmllint.

That's when the problems kick in. xmllint routinely has issues in reporting validation errors incorrectly. It doesn't give false positives or negatives, but if the document fails to validate, xmllint will often report the wrong element or attribute for a given error. Sometimes the error is correct ("did not expect to see element 'bar'), but only because a previous error was not reported (because 'bar' was supposed to be following the required but missing element 'foo', but xmllint doesn't tell us that bit). Note that this is a long-standing problem with xmllint and even the latest version has the same problems. We often have huge XML documents and misreporting the errors causes much grief for both clients and developers.

like image 254
Ovid Avatar asked Nov 03 '08 12:11

Ovid


People also ask

How do you validate a schema against an XML file?

Before you can validate, you must register your XML schema with the built-in XML schema repository (XSR). This process involves registering each XML schema document that makes up the XML schema and then completing the registration. One method of registering an XML schema is through commands.

Can we validate XML schema?

You can validate your XML documents against XML schemas only; validation against DTDs is not supported.

How does XML schema validation work?

Validating means running a process to ensure that the XML Document proceeds the rules defined by the standard schemas. Speaking, schemas are validated due to data completeness of checking the required information, a data structure of the elements and attributes are corrected like the order of child elements.

How XML is used for validation?

XML validation is the process of determining whether the structure, content, and data types of an XML document are valid. XML validation also strips off ignorable whitespace in the XML document.


1 Answers

I think that JDrago has the right idea, that you need to avoid libxml2-based tools for RNG validation, at least for now. I'm discovering this as well in my project. I recently logged two bugs against libxml2 concerning RNG validation.

I recommend jing. It was written by James Clark, the creator of Relax NG and one of the leading lights in the XML world. He is also the author of trang, which you are already using. Development of this code (and of trang) recently resumed at the Google Code site I link to above.

Jing has proved consistently correct with our content and schema, and to give much better error messages than libxml2, though there is still a lot of room for improvement in that regard.

The one shortcoming of jing vis a vis libxml2/xmllint is that it doesn't at present use OASIS XML catalogs to resolve public and system identifiers and URIs pointing to schemas. This would be an issue in case you have included schemas that are referred to by 'http' URI--those would always be fetched over the network.

like image 72
ChuckB Avatar answered Oct 09 '22 03:10

ChuckB