Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a schema to reorder the elements of an XML document in conformance with the schema

Tags:

java

xml

xsd

xsom

Say I have an XML document (represented as text, a W3C DOM, whatever), and also an XML Schema. The XML document has all the right elements as defined by the schema, but in the wrong order.

How do I use the schema to "re-order" the elements in the document to conform to the ordering defined by the schema?

I know that this should be possible, probably using XSOM, since the JAXB XJC code generator annotates its generated classes with the correct serialization order of the elements.

However, I'm not familiar with the XSOM API, and it's pretty dense, so I'm hoping one of you lot has some experience with it, and can point me in the right direction. Something like "what child elements are permitted inside this parent element, and in what order?"


Let me give an example.

I have an XML document like this:

<A>
   <Y/>
   <X/>
</A>

I have an XML Schema which says that the contents of <A> must be an <X> followed by a <Y>. Now clearly, if I try to validate the document against the schema, it fails, since the <X> and <Y> are in the wrong order. But I know my document is "wrong" in advance, so I'm not using the schema to validate just yet. However, I do know that my document has all of the correct elements as defined by the schema, just in the wrong order.

What I want to do is to programmatically examine the Schema (probably using XSOM - which is an object model for XML Schema), and ask it what the contents of <A> should be. The API will expose the information that "you need an <X> followed by a <Y>".

So I take my XML document (using a DOM API) and re-arrange and accordingly, so that now the document will validate against the schema.

It's important to understand what XSOM is here - it's a java API which represents the information contained in an XML Schema, not the information contained in my instance document.

What I don't want to do is generate code from the schema, since the schema is unknown at build time. Furthermore, XSLT is no use, since the correct ordering of the elements is determined solely by the data dictionary contained in the schema.

Hopefully that's now explicit enough.

like image 716
skaffman Avatar asked Sep 16 '09 21:09

skaffman


People also ask

What is a document that conforms to an XML schema called?

An XML schema (referred to in this appendix as schema) defines a class of XML documents. The term "instance document" is often used to describe an XML document that conforms to a particular XML schema.

Which of the following schema is created with a database to handle the XML document?

XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe and validate the structure and the content of XML data. XML schema defines the elements, attributes and data types.

Which schema tag allow to specify elements in any order?

xs:all specifies that the child elements can appear in any order.


2 Answers

I was stuck with the same problem for around two weeks. Finally I got the breakthrough. This can be achieved using JAXB marshalling/unmarshalling feature.

In JAXB marshal/unmarshal, XML validation is an optional feature. So while creating Marshaller and UnMarshaller objects, we do not call setSchema(schema) method. Omitting this step avoids XML validation feature of marshal/unmarshal.

So now,

  1. If any mandatory element as per XSD is not present in XML, it is overlooked.
  2. If any tag not present in XSD is present in XML, no error is thrown and it is not present in new XML got after marshalling/unmarshalling.
  3. If elements are not in sequence, they are reordered. This is done by JAXB generated POJOs which we pass while creating JAXBContext.
  4. If an element is misplaced inside some other tag, then, it is omitted in new XML. No error is thrown while marshalling/unmarshalling.

public class JAXBSequenceUtil {
  public static void main(String[] args) throws JAXBException, IOException {

    String xml = FileUtils.readFileToString(new File(
            "./conf/out/Response_103_1015700001&^&IOF.xml"));

    System.out.println("Before marshalling : \n" + xml);
    String sequencedXml = correctSequence(xml,
            "org.acord.standards.life._2");
    System.out.println("After marshalling : \n" + sequencedXml);
  }

  /**
   * @param xml
   *            - XML string to be corrected for sequence.
   * @param jaxbPackage
   *            - package containing JAXB generated classes using XSD.
   * @return String - xml with corrected sequence
   * @throws JAXBException
   */
  public static String correctSequence(String xml, String jaxbPackage)
        throws JAXBException {
    JAXBContext jaxbContext = JAXBContext.newInstance(jaxbPackage);
    Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
    Object txLifeType = unmarshaller.unmarshal(new InputSource(
            new StringReader(xml)));
    System.out.println(txLifeType);

    StringWriter stringWriter = new StringWriter();
    Marshaller marshaller = jaxbContext.createMarshaller();
    marshaller.marshal(txLifeType, stringWriter);

    return stringWriter.toString();
  }
}
like image 109
Anurag Pimpale Avatar answered Sep 19 '22 00:09

Anurag Pimpale


I don't have a good answer to this yet, but I have to note that there is potential for ambiguity there. Consider this schema:

<xs:element name="root">
  <xs:choice>
    <xs:sequence>
      <xs:element name="foo"/>
      <xs:element name="bar">
        <xs:element name="dee">
        <xs:element name="dum">
      </xs:element>
    </xs:sequence>
    <xs:sequence>
      <xs:element name="bar">
        <xs:element name="dum">
        <xs:element name="dee">
      </xs:element>
      <xs:element name="foo"/>
    </xs:sequence>
  </xs:choice>
</xs:element>

and this input XML:

<root>
  <foo/>
  <bar>
    <dum/>
    <dee/>
  </bar>
</root>

This could be made to comply with the schema either by reordering <foo> and <bar>, or by reordering <dee> and <dum>. There doesn't seem to be any reason to prefer one over another.

like image 42
Pavel Minaev Avatar answered Sep 21 '22 00:09

Pavel Minaev