In my java application I have to handle XML files with different schema versions (xsd files) simultaneously. The content of the XML files changed only a little between the different versions, so I'd like to use mainly the same code to handle it and just do some case distictions dependent on the version of the used schema.
Right now I'm parsing the XML files with a SAX parser and my own ContentHandler
ignoring the schema version and just checking if the tags I need for processing are present.
I'd really like to use JAXB to generate the classes for parsing the XML files. This way I could remove all the hardcoded strings (constants) from my java code and handle with the generated classes instead.
I compiled the schema versions to different packages v1, v2 and v3. Now I can create an Unmarshaller
this way:
JAXBContext jc = JAXBContext.newInstance(
v1.Root.class, v2.Root.class, v3.Root.class );
Unmarshaller u = jc.createUnmarshaller();
Now u.unmarshal( xmlInputStream );
gives me the Root
class from the package matching the schema of the XML file.
Next I'll try to define an interface
to access the common parts of the schemas. If you have done something like this before, please let me know. In the mean time I'm reading through the JAXB specs...
xsd is the XML schema you will use as input to the JAXB binding compiler, and from which schema-derived JAXB Java classes will be generated. For the Customize Inline and Datatype Converter examples, this file contains inline binding customizations.
This XSD information is then used to parse the spec XML into the XML Java object model. XSD model. XSDComplexTypeObj and XSDSimpleTypeObj, which both inherited from XSDStructureObj, are Java objects that describe the structures of the data types that are described by a given spec XSD declaration. XSD parser.
Reference the XSD schema in the XML document using XML schema instance attributes such as either xsi:schemaLocation or xsi:noNamespaceSchemaLocation. Add the XSD schema file to a schema cache and then connect that cache to the DOM document or SAX reader, prior to loading or parsing the XML document.
XSD is based and written on XML. XSD defines elements and structures that can appear in the document, while XML does not. XSD ensures that the data is properly interpreted, while XML does not. An XSD document is validated as XML, but the opposite may not always be true.
First, you need some way to identify the schema appropriate for the particular instance document. You say that the documents have a schemaLocation
attribute, so this is one solution. Note, however, that you have to specifically configure the parser to use this attribute, and a malicious document could specify a schema location that you don't control. Instead, I'd recommend getting the attribute value, and using it to find the appropriate schema in an internal table.
Next is access to the data. You don't say why you're using three different schemas. The only rational reason is an evolving data spec (ie, the schemas represent versions 1, 2, and 3 of the same data). If that's not your reason, then you need to rethink your design.
If you are trying to support an evolving data spec, then you need to answer the question "how do I deal with data that's missing." There are a couple of answers to this: one is to maintain multiple versions of the code. With refactoring of common functionality, this is not a bad idea, but it can easily become unmaintainable.
The alternative is to use a single codebase, and some sort of adapter object that incorporates your rules. And if you go down this path, JAXB is the wrong solution, since it is tied to a schema. You might be able to use a permissive XML->Java converter: I believe XStream will work, and I know that the 1.1 release of Practical XML will work (since I wrote it) -- although you'd have to build it yourself.
Another, better alternative, depending on the complexity of the schema, is to develop a set of objects that use XPath to retrieve the data. I would probably implement using a "master" object that contains XPath expressions for every field, in every variant of the schema. Then create lightweight "wrapper" objects that hold a DOM version of your instance document, and use the XPath appropriate to the schema. Note, however, that this is limited tor read-only access.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With