Say I have an XML document (represented as text, a W3C DOM, whatever), and also an XML Schema. The XML document has all the right elements as defined by the schema, but in the wrong order.
How do I use the schema to "re-order" the elements in the document to conform to the ordering defined by the schema?
I know that this should be possible, probably using XSOM, since the JAXB XJC code generator annotates its generated classes with the correct serialization order of the elements.
However, I'm not familiar with the XSOM API, and it's pretty dense, so I'm hoping one of you lot has some experience with it, and can point me in the right direction. Something like "what child elements are permitted inside this parent element, and in what order?"
Let me give an example.
I have an XML document like this:
<A>
<Y/>
<X/>
</A>
I have an XML Schema which says that the contents of <A>
must be an <X>
followed by a <Y>
. Now clearly, if I try to validate the document against the schema, it fails, since the <X>
and <Y>
are in the wrong order. But I know my document is "wrong" in advance, so I'm not using the schema to validate just yet. However, I do know that my document has all of the correct elements as defined by the schema, just in the wrong order.
What I want to do is to programmatically examine the Schema (probably using XSOM - which is an object model for XML Schema), and ask it what the contents of <A>
should be. The API will expose the information that "you need an <X>
followed by a <Y>
".
So I take my XML document (using a DOM API) and re-arrange and accordingly, so that now the document will validate against the schema.
It's important to understand what XSOM is here - it's a java API which represents the information contained in an XML Schema, not the information contained in my instance document.
What I don't want to do is generate code from the schema, since the schema is unknown at build time. Furthermore, XSLT is no use, since the correct ordering of the elements is determined solely by the data dictionary contained in the schema.
Hopefully that's now explicit enough.
An XML schema (referred to in this appendix as schema) defines a class of XML documents. The term "instance document" is often used to describe an XML document that conforms to a particular XML schema.
XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe and validate the structure and the content of XML data. XML schema defines the elements, attributes and data types.
xs:all specifies that the child elements can appear in any order.
I was stuck with the same problem for around two weeks. Finally I got the breakthrough. This can be achieved using JAXB marshalling/unmarshalling feature.
In JAXB marshal/unmarshal, XML validation is an optional feature. So while creating Marshaller and UnMarshaller objects, we do not call setSchema(schema) method. Omitting this step avoids XML validation feature of marshal/unmarshal.
So now,
public class JAXBSequenceUtil {
public static void main(String[] args) throws JAXBException, IOException {
String xml = FileUtils.readFileToString(new File(
"./conf/out/Response_103_1015700001&^&IOF.xml"));
System.out.println("Before marshalling : \n" + xml);
String sequencedXml = correctSequence(xml,
"org.acord.standards.life._2");
System.out.println("After marshalling : \n" + sequencedXml);
}
/**
* @param xml
* - XML string to be corrected for sequence.
* @param jaxbPackage
* - package containing JAXB generated classes using XSD.
* @return String - xml with corrected sequence
* @throws JAXBException
*/
public static String correctSequence(String xml, String jaxbPackage)
throws JAXBException {
JAXBContext jaxbContext = JAXBContext.newInstance(jaxbPackage);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
Object txLifeType = unmarshaller.unmarshal(new InputSource(
new StringReader(xml)));
System.out.println(txLifeType);
StringWriter stringWriter = new StringWriter();
Marshaller marshaller = jaxbContext.createMarshaller();
marshaller.marshal(txLifeType, stringWriter);
return stringWriter.toString();
}
}
I don't have a good answer to this yet, but I have to note that there is potential for ambiguity there. Consider this schema:
<xs:element name="root">
<xs:choice>
<xs:sequence>
<xs:element name="foo"/>
<xs:element name="bar">
<xs:element name="dee">
<xs:element name="dum">
</xs:element>
</xs:sequence>
<xs:sequence>
<xs:element name="bar">
<xs:element name="dum">
<xs:element name="dee">
</xs:element>
<xs:element name="foo"/>
</xs:sequence>
</xs:choice>
</xs:element>
and this input XML:
<root>
<foo/>
<bar>
<dum/>
<dee/>
</bar>
</root>
This could be made to comply with the schema either by reordering <foo>
and <bar>
, or by reordering <dee>
and <dum>
. There doesn't seem to be any reason to prefer one over another.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With