Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML Validation in ant failing with errors that don't match the files being validated

Thanks in advance for any help...

I am having a problem with XML files that are failing validation against a DTD (via the ant xmlvalidate task), but the reported errors in the XML doc do not match the contents of the document being validated. Furthermore, the same files opened in Oxygen validate without problems.

An example of the ant output reporting the errors is as follows:

[xmlvalidate] /Path/to/file.xml:240:91: Attribute "match_style" with value "ble" must have a value from the list "any all none ".

On visual inspection of the file in question, the value of the match_style attribute on line 240 is all. A search of the file shows that the string ble, while it does occur several times in the document (as a substring of table in tags, and also of enable as an attribute name), doesn't appear at all between lines 145 and 328.

I have tried hand-editing the XML files and revalidating. If I remove line breaks or other whitespace (and sometimes if I add line breaks) from earlier in the file than the reported error (making no other changes) it will occasionally fix things entirely and the file will then validate. In other cases, it still fails, but the error is further down the file, and additional edits to whitespace closer to the new error will fix things or move the "error" down even further. I have not been able to discern any rhyme or reason to what such edits will fix things, and which won't.

Just to repeat the salient point: sometimes changing the whitespace and making no other changes causes the file to validate.

I have searched the XML files for invisible and control characters that might be doing weird things, but haven't found anything other than garden variety whitespace, all where it is supposed to be.

The files are produced via XSLT 2.0 transformation from source files in various other flavours of XML. The transformation is done via the Java task using Saxon, in an ant build. (I haven't been able to get either the XSLT task or Saxon task to work as desired because my XSLs in some cases produce multiple result docs from a single source file, and all but the first result doc always seem to be omitted with those tasks.) Here's the task:

<java classname="net.sf.saxon.Transform" fork="true"
    output="${dest.dir}/build"
    resultproperty="transform_result"
    failonerror="true">
    <arg line="-o ${dest.dir}/ ${source.dir}/xml_sources ${source.dir}/xsl/transform.xsl"/>
</java>

I have also searched the XSLs and source XML files for unusual characters, and played around with character maps in the XSLs and indenting in the result docs to make sure there is nothing weird going on with unusual whitespace characters. Nothing weird ever found, and the only differences the character maps or changes to indenting make is essentially the same as editing whitespace by hand - the "error"s sometimes move around, but still happen.

I have tried using different versions of Saxon and different versions of Ant, with no different results. The problem started a while back (not exactly sure when), but everything used to work once upon a time, so I've tried using older versions of my XSLs and sources, but haven't found an older version of things that doesn't display the problem (though because of a switch from CVS to SVN, and the CVS no longer existing, I may simply not be able to go back far enough, because some of the oldest revisions are now lost).

The DTD the files must validate against is not mine - I cannot change it or switch to a schema.

I usually work on a Mac (currently running 10.7.5), but the problem also happens on Linux (not sure what version). The one variable I haven't really been able to play with is Java. I may have been running Java 1.5 back when things last worked properly - I am now using 1.7, and it definitely happened when I was running 1.6. I can't go back to 1.5 on the machines I have available.

That's all the information I can think of that might be relevant.

I am at my wits' end with this problem. In all my research, I've never so much as heard of anyone else reporting the same issue, let alone resolving it. Any thoughts on what might be wrong would be greatly, greatly appreciated.

Thanks!

like image 422
Dawn Avatar asked Nov 01 '22 10:11

Dawn


1 Answers

My suspicion would be the Xerces parser that comes with the JDK, which is buggy. Try with the version of Xerces from Apache, which is much better.

(I say this because I have previously seen the JDK version of Xerces misreport attribute values that contain strings which are present in the document, but not as the values of attributes.)

like image 166
Michael Kay Avatar answered Nov 15 '22 10:11

Michael Kay