Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a valid XSD that is open using <all> and <any> elements

Tags:

jaxb

xsd

I need to specify a XSD for validating XML documents. The XSD will be used for a JAXB generation of Java bindings. My problem is specifying optional elements which I do not know the names of and which I in general am not interested in parsing.

The structure of the XML documents is like:

<TRADE>
  <TIME>12:12</TIME>
  <MJELLO>12345</MJELLO>
  <OPTIONAL>12:12</OPTIONAL>
  <DATE>25-10-2011</DATE>
  <HELLO>hello should be ignored</HELLO>
</TRADE>

The important thing is, that:

  • I can not assume any order, and the next XML document instance migtht have tags in a different order
  • I am only interested in parsing some of the tags, some are mandatory and some are optional
  • The XML documents can be extended with new elements which I am not interested in parsing

The structure of my XSD is like (not a valid xsd):

<?xml version="1.0" encoding="ISO-8859-1"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <!-- *********************************************** -->
  <!-- Trade element definitions for the XML Documents -->
  <!-- *********************************************** -->

  <xs:complexType name="Trade">
    <!-- Using the all construction ensures that the order does not matter -->
    <xs:all>
      <xs:element name="DATE" type="xs:string" minOccurs="1" maxOccurs="1" />
      <xs:element name="TIME" type="xs:string" minOccurs="1" maxOccurs="1" />
      <xs:element name="OPTIONAL" type="xs:string" minOccurs="0" maxOccurs="1" />
      <xs:any minOccurs="0"/>
    </xs:all>
  </xs:complexType>

  <!-- TRADE is the mandatory top-level tag -->
  <xs:element name="TRADE" type="Trade"/>

</xs:schema>

So, in this example: DATE and TIME are mandatory (they must be in the XML exactly once), OPTIONAL might be present once and then I would like to specify, that all other tags are allowed. The order does not matter.

How do I specify a valid XSD for this?

like image 476
Morten Frank Avatar asked Mar 03 '11 08:03

Morten Frank


2 Answers

This is a classic parser problem.

Basically, your BNF is:

Trade    = whatever whatever*
whatever = "DATE"  | "TIME" | anything
anything = a-z a-z*

But this is ambigous. The string "DATE" can both be accepted under the whatever rule as "DATE" and as anything.

So if you have

<TRADE>
  <TIME>12:12</TIME>
  <DATE>25-10-2011</DATE>
  <DATE>25-12-2011</DATE>
</TRADE>

it is unclear whether that should be accepted or not.

It could be interpreted either one of

"TIME", "DATE", anything
anything, anything, "DATE"
anything, anything, anything
"TIME", "DATE", anything
"TIME", "DATE", "DATE"
etc.

It all boils down to: If you have a wildcard combined with random sequence, you cannot meaningfully decide which token matches which rule.

It especially does not make sense to have optional elements together with a wilcard.

You have two options:

  • use xs:sequence instead of xs:all
  • do not use wildcard

As I understand it, both options are in conflict with your wishes.

Perhaps you can construct a wildcard that matches everything except DATE, TIME etc.

like image 163
Jørgen Elgaard Larsen Avatar answered Nov 17 '22 14:11

Jørgen Elgaard Larsen


Is it a hard requirement to have JAXB bindings to your "known" elements? If not, you can basically have just <any maxoccurs="unbounded" processContents="skip"/> as your xsd, and then pick out the elements you are interested in from the DOM tree.

(See here how to use JAXB without data binding.)

like image 35
Tore Green Avatar answered Nov 17 '22 13:11

Tore Green