Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing XML with references to previous tags, and with children corresponding to subtypes of some class

I have to deal with (a variation of) the following scenario. My model classes are:

class Car {
    String brand;
    Engine engine;
}

abstract class Engine {
}

class V12Engine extends Engine {
    int horsePowers;
}

class V6Engine extends Engine {
    String fuelType;
}

And I have to deserialize (no need for serialization support ATM) the following input:

<list>

    <brand id="1">
        Volvo
    </brand>

    <car>
        <brand>BMW</brand>
        <v12engine horsePowers="300" />
    </car>

    <car>
        <brand refId="1" />
        <v6engine fuel="unleaded" />
    </car>

</list>

What I've tried / issues:

I've tried using XStream, but it expects me to write tags such as:

<engine class="cars.V12Engine">
    <horsePowers>300</horsePowers>
</engine>

etc. (I don't want an <engine>-tag, I want a <v6engine>-tag or a <v12engine>-tag.

Also, I need to be able to refer back to "predefined" brands based on identifiers, as shown with the brand-id above. (For instance by maintaining a Map<Integer, String> predefinedBrands during the deserialization). I don't know if XStream is well suited for such scenario.

I realize that this could be done "manually" with a push or pull parser (such as SAX or StAX) or a DOM-library. I would however prefer to have some more automation. Ideally, I should be able to add classes (such as new Engines) and start using them in the XML right away. (XStream is by no means a requirement, the most elegant solutions wins the bounty.)

like image 476
aioobe Avatar asked Dec 27 '12 11:12

aioobe


People also ask

What are the two methods of parsing in XML document?

To read and update, create and manipulate an XML document, you will need an XML parser. In PHP there are two major types of XML parsers: Tree-Based Parsers. Event-Based Parsers.

What is XML data parsing?

XML parsing is the process of reading an XML document and providing an interface to the user application for accessing the document. An XML parser is a software apparatus that accomplishes such tasks.

Which of the following methods is used to parse an XML document?

DOM parser parses the entire XML file and creates a DOM object in the memory. It models an XML file in a tree structure for easy traversal and manipulation. In DOM everything in an XML file is a node.

What is the best way to parse XML in Java?

DOM Parser is the easiest java xml parser to learn. DOM parser loads the XML file into memory and we can traverse it node by node to parse the XML. DOM Parser is good for small files but when file size increases it performs slow and consumes more memory.


1 Answers

JAXB (javax.xml.bind) can do everything you're after, though some bits are easier than others. For the sake of simplicity I'm going to assume that all your XML files have a namespace - it's trickier if they don't but can be worked around using the StAX APIs.

<list xmlns="http://example.com/cars">

    <brand id="1">
        Volvo
    </brand>

    <car>
        <brand>BMW</brand>
        <v12engine horsePowers="300" />
    </car>

    <car>
        <brand refId="1" />
        <v6engine fuel="unleaded" />
    </car>

</list>

and assume a corresponding package-info.java of

@XmlSchema(namespace = "http://example.com/cars",
           elementFormDefault = XmlNsForm.QUALIFIED)
package cars;
import javax.xml.bind.annotation.*;

Engine type by element name

This is simple, using @XmlElementRef:

package cars;
import javax.xml.bind.annotation.*;

@XmlRootElement
@XmlAccessorType(XmlAccessType.FIELD)
public class Car {
    String brand;
    @XmlElementRef
    Engine engine;
}

@XmlRootElement
abstract class Engine {
}

@XmlRootElement(name = "v12engine")
@XmlAccessorType(XmlAccessType.FIELD)
class V12Engine extends Engine {
    @XmlAttribute
    int horsePowers;
}

@XmlRootElement(name = "v6engine")
@XmlAccessorType(XmlAccessType.FIELD)
class V6Engine extends Engine {
    // override the default attribute name, which would be fuelType
    @XmlAttribute(name = "fuel")
    String fuelType;
}

The various types of Engine are all annotated @XmlRootElement and marked with appropriate element names. At unmarshalling time the element name found in the XML is used to decide which of the Engine subclasses to use. So given XML of

<car xmlns="http://example.com/cars">
    <brand>BMW</brand>
    <v12engine horsePowers="300" />
</car>

and unmarshalling code

JAXBContext ctx = JAXBContext.newInstance(Car.class, V6Engine.class, V12Engine.class);
Unmarshaller um = ctx.createUnmarshaller();
Car c = (Car)um.unmarshal(new File("file.xml"));

assert "BMW".equals(c.brand);
assert c.engine instanceof V12Engine;
assert ((V12Engine)c.engine).horsePowers == 300;

To add a new type of Engine simply create the new subclass, annotate it with @XmlRootElement as appropriate, and add this new class to the list passed to JAXBContext.newInstance().

Cross-references for brands

JAXB has a cross-referencing mechanism based on @XmlID and @XmlIDREF but these require that the ID attribute be a valid XML ID, i.e. an XML name, and in particular not entirely consisting of digits. But it's not too difficult to keep track of the cross references yourself, as long as you don't require "forward" references (i.e. a <car> that refers to a <brand> that has not yet been "declared").

The first step is to define a JAXB class to represent the <brand>

package cars;

import javax.xml.bind.annotation.*;

@XmlRootElement
public class Brand {
  @XmlValue // i.e. the simple content of the <brand> element
  String name;

  // optional id and refId attributes (optional because they're
  // Integer rather than int)
  @XmlAttribute
  Integer id;

  @XmlAttribute
  Integer refId;
}

Now we need a "type adapter" to convert between the Brand object and the String required by Car, and to maintain the id/ref mapping

package cars;

import javax.xml.bind.annotation.adapters.*;
import java.util.*;

public class BrandAdapter extends XmlAdapter<Brand, String> {
  private Map<Integer, Brand> brandCache = new HashMap<Integer, Brand>();

  public Brand marshal(String s) {
    return null;
  }


  public String unmarshal(Brand b) {
    if(b.id != null) {
      // this is a <brand id="..."> - cache it
      brandCache.put(b.id, b);
    }
    if(b.refId != null) {
      // this is a <brand refId="..."> - pull it from the cache
      b = brandCache.get(b.refId);
    }

    // and extract the name
    return (b.name == null) ? null : b.name.trim();
  }
}

We link the adapter to the brand field of Car using another annotation:

@XmlRootElement
@XmlAccessorType(XmlAccessType.FIELD)
public class Car {
    @XmlJavaTypeAdapter(BrandAdapter.class)
    String brand;
    @XmlElementRef
    Engine engine;
}

The final part of the puzzle is to ensure that <brand> elements found at the top level get saved in the cache. Here is a complete example

package cars;

import javax.xml.bind.*;
import java.io.File;
import java.util.*;

import javax.xml.stream.*;
import javax.xml.transform.stream.StreamSource;

public class Main {
  public static void main(String[] argv) throws Exception {
    List<Car> cars = new ArayList<Car>();

    JAXBContext ctx = JAXBContext.newInstance(Car.class, V12Engine.class, V6Engine.class, Brand.class);
    Unmarshaller um = ctx.createUnmarshaller();

    // create an adapter, and register it with the unmarshaller
    BrandAdapter ba = new BrandAdapter();
    um.setAdapter(BrandAdapter.class, ba);

    // create a StAX XMLStreamReader to read the XML file
    XMLInputFactory xif = XMLInputFactory.newFactory();
    XMLStreamReader xsr = xif.createXMLStreamReader(new StreamSource(new File("file.xml")));

    xsr.nextTag(); // root <list> element
    xsr.nextTag(); // first <brand> or <car> child

    // read each <brand>/<car> in turn
    while(xsr.getEventType() == XMLStreamConstants.START_ELEMENT) {
      Object obj = um.unmarshal(xsr);

      // unmarshal from an XMLStreamReader leaves the reader pointing at
      // the event *after* the closing tag of the element we read.  If there
      // was a text node between the closing tag of this element and the opening
      // tag of the next then we will need to skip it.
      if(xsr.getEventType() != XMLStreamConstants.START_ELEMENT && xsr.getEventType() != XMLStreamConstants.END_ELEMENT) xsr.nextTag();

      if(obj instanceof Brand) {
        // top-level <brand> - hand it to the BrandAdapter so it can be
        // cached if necessary
        ba.unmarshal((Brand)obj);
      }
      if(obj instanceof Car) {
        cars.add((Car)obj);
      }
    }
    xsr.close();

    // at this point, cars contains all the Car objects we found, with
    // any <brand> refIds resolved.
  }
}
like image 183
Ian Roberts Avatar answered Oct 22 '22 23:10

Ian Roberts