Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XmlSchema Whitespace collapse: What happens to multiple whitespaces?

I use the following XmlSchema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://www.test.com/XmlValidation"
  elementFormDefault="qualified"
  attributeFormDefault="unqualified"
  xmlns:m="http://www.test.com/XmlValidation">

  <xs:element name="test">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="testElement" type="m:requiredStringType"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:simpleType name="requiredStringType">
    <xs:restriction base="xs:string">
      <xs:minLength value="1"/>
      <xs:whiteSpace value="collapse"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>

It defines a requiredStringType that must be at least one character long and also defines whitespace collapse.

When I validate the following Xml document the validation succeedes:

<?xml version="1.0" encoding="UTF-8"?>
<test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.text.com/XmlValidation">
    <testElement>     </testElement>
</test>

w3.org defines for whitespace collapse:

"After the processing implied by replace, contiguous sequences of #x20's are collapsed to a single #x20, and leading and trailing #x20's are removed."

Does this mean that 3 whitespaces are collapsed to one or to zero whitespaces? In XmlSpy the validation fails, in .Net it succeeds.

like image 296
crauscher Avatar asked Mar 20 '09 13:03

crauscher


People also ask

Is whitespace significant in XML?

In XML documents, there are two types of whitespace: Significant whitespace is part of the document content and should be preserved. Insignificant whitespace is used when editing XML documents for readability. These whitespaces are typically not intended for inclusion in the delivery of the document.

What is whitespace in XSD?

According to the XML standard, whitespace is space characters (U+0020), carriage returns (U+000D), line feeds (U+000A), or tabs (U+0009) that are in the document to improve readability.


1 Answers

Since it says that leading and trailing whitespace are removed, that means that a string that contains only whitespace will be collapsed to an empty string. XmlSpy is being accurate in the validation and .NET is being generous (or is making an error).

This is according to White Space Normalization during Validation from XML Schema Part 1: Structures Second Edition.

preserve
No normalization is done, the value is the ·normalized value·
replace
All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return) are replaced > with #x20 (space).
collapse
Subsequent to the replacements specified above under replace, contiguous sequences of #x20s are collapsed to a single #x20, and initial and/or final #x20s are deleted.

Thus, first all whitespace is replaced by blank characters, second contiguous sequences are replaced with a single blank character, third and last, initial and final blanks are deleted. Following this sequence, a string containing only whitespace must be normalized to an empty string during validation.

like image 74
Eddie Avatar answered Sep 21 '22 02:09

Eddie