Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML Validation in Java: processContents="lax" seems not to work correctly

Tags:

java

xml

xsd

sax

I have an XML Schema which contains a number of

<any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded" />

definitions, i.e., it allows to insert arbitrary tags of other namespaces. processContents="lax" indicates that the parser should try do validate these tags, if it has the according schema (1) (2).

For me this means, that if I give the parser all schema documents, and there is an invalid XML tag of one of the secondary namespaces, it needs to report an error.

However, it seems that the Java XML validator ignores such errors. I have verified that the parser has all the necessary schema documents to perform the validation (if I change the XML schema to processContents="strict", it works as expected and uses the secondary schema documents for validation). It seems that for the validator behaves as if the attribute is specified with value skip.

Java code for validation:

/*
 * xmlDokument is the file name of the XML document
 * xsdSchema is an array with all schema documents
 */
public static void validate( String xmlDokument, Source[] xsdSchema ) throws SAXException, IOException {   
  SchemaFactory schemaFactory = SchemaFactory.newInstance( XMLConstants.W3C_XML_SCHEMA_NS_URI );
  Schema schema = schemaFactory.newSchema( xsdSchema );
  Validator validator = schema.newValidator();
  validator.setErrorHandler( new MyErrorHandler() );
  validator.validate( new StreamSource(new File(xmlDokument)) );
}

Minimal example:

The primary schema:

<xs:schema
    xmlns="baseNamespace"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    targetNamespace="baseNamespace"
    xmlns:tns="baseNamespace">

<!-- Define single tag "baseTag" -->
<xs:element name="baseTag">
  <xs:complexType>
    <xs:sequence>
      <xs:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>
</xs:schema>

The secondary schema:

<xs:schema
    xmlns="secondaryNamespace"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    targetNamespace="secondaryNamespace"
    xmlns:tns="secondaryNamespace"
    elementFormDefault="qualified"
    attributeFormDefault="qualified">

<xs:element name="additionalTag"/>

</xs:schema>

The XML document I am trying to validate:

<baseTag
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="baseNamespace"
  xmlns:secondary="secondaryNamespace"
  xsi:schemaLocation="
    baseNamespace base.xsd
    secondaryNamespace secondary.xsd">

  <secondary:additionalTag/>
  <secondary:invalidTag/>
</baseTag>

Using the above Java code giving both schema documents does not produce any validation errors, only if I change the lax to strict in the base schema (which I don't want). The error message in this case is

cvc-complex-type.2.4.c: The matching wildcard is strict, but no declaration can be found for element 'secondary:invalidTag'.

Questions:

Did I misunderstand something and is this actually the correct behavior? Or am I right regarding processContents?

Are my schema documents doing the right thing?

Is my Java code correct? How could I change it so that it behaves as expected?

like image 952
Philipp Wendler Avatar asked Oct 19 '11 11:10

Philipp Wendler


1 Answers

According to the spec:

"It will validate elements and attributes for which it can obtain schema information, but it will not signal errors for those it cannot obtain any schema information."

So, when you use procesContents "lax", the validator cannot find a schema for the "invalidTag" and therefore ignores it, as per the spec.

like image 160
jtahlborn Avatar answered Oct 13 '22 23:10

jtahlborn