Loading OOP structure from XML--how to design code

Question

I have a structure consisting of a number of classes, kind of like so:

Document
- Track (each document can have multiple tracks)
  - Clip (each track can have multiple clips)
- (Other types may be added in the future)

I'm storing these documents as XML, like so: <Document><Track><Clip>...</Clip></Track></Document>

In each class, I have a toXML() method, which describes its contents in XML form. Document::toXML() is responsible for calling toXML() on its children, and combining the result. This way, saving is quite trivial and easy to extend in my opinion.

But now I'm having trouble with how to design the loading code.

There are two ways I can think of:

1: Huge if-statement in Document::fromXML(), something like:

// Pseudo code
for each element {
   if element.name == "Track" createTrack(element)
   if element.name == "Clip" createClipOnLastTrack(element)
   // .. more statements for new types
}

2: A "loader" class which keeps of loading methods for all types, something like:

// Track.fromXML will be responsible for calling Clip.fromXML
Loader.register("Track", &Track.fromXML)
Loader.register("OtherType", &OtherType.fromXML)
// More register() calls for new types

for each element {
   // Call appropriate method based on tag name
   Loader.load(element.name, element)
}

I don't really like #1, it feels clumsy. #2 feels better, but I'm not sure if it's a good design.

Is it? Is there some other common/popular way of translating an XML document to a set of actual object instances?

user1201210 · Accepted Answer

I think the first approach is reasonable based on what I know about your problem. The second approach seems more complex than necessary unless you're confident that the mappings will always stay simple and you'll be making a lot of mapping changes.

To give a more thorough answer, though, I'll cover three approaches that I would consider in your position. Each approach varies by coupling and complexity, and two of them you've essentially covered but that I'll flesh out a bit more. I don't consider any of them to be "authoritative solutions" since I don't know the full scope of your problem, but I think they're worth mentioning.

The highest coupling, least complex approach I would consider is a set of static factory functions in Document, Track, and Clip, which is essentially the first option you mention. I haven't seen this approach used much, but it's likely that other developers have. It has a Ruby/ActiveRecord feel to it to me (which isn't a judgement, just a random thought).

//all examples are C++ish pseudo-code
Document* Document::fromXML(SomeXMLStream* stream) { 
    Document* doc = new Document();
    //read the details specific to Document, populate *doc

    //for each <Track> child in the stream...
    Track* track = Track::fromXML(stream);
    //add the track to *doc

    return doc;
}

Track* Track::fromXML(SomeXMLStream* stream) { 
    Track* track = new Track();
    //similar steps here

    //for each <Clip> child in the stream...
    Clip* clip = Clip::fromXML(stream);
    //and so on

    return track;
}

//similar code for Clip::fromXML(...)

The high coupling (i.e., the classes knowing about the XML) gives you the advantage of putting the fromXML logic right next to the toXML logic since it's reasonable -- and convenient -- to have the writer and the reader be defined in the same place. A change in the XML layout requires two changes (one in fromXML and the other in toXML), but the changes take place in one file.

The downside of this approach is the same downside that comes from coding toXML in the classes themselves: you'd better like the XML the way it is because it's going be hard-coded. But if you're committed to your toXML implementation, I see nothing wrong with committing to the same approach with fromXML.

The second approach I would consider introduces deserializers (or mappers or marshallers or whatever you prefer to call them) to serve as the arbiter over the XML. The coupling between XML and model moves out of Document, Track, and Clip and into these deserializers. I have seen this approach used often "in the field" with both hand-written and auto-generated code.

Document* DocumentDeserializer::fromXML(SomeXMLStream* stream) {
    Document *doc = new Document();
    //read the details specific to Document, populate *doc

    //for each <Track> child in the stream...
    Track* track = TrackDeserializer::fromXML(stream);
    //add the track to *doc

    return doc;
}

//similar code for Track and Clip

The obvious downside to this approach is that now you're writing XML in the classes with toXML but reading it in with the deserializers, so a change to the XML layout means a change to two classes. If toXML gets moved into the same class (maybe call the classes <ModelClassName>XMLMapper), this downside goes away.

One minor downside to this approach is that it makes synchronizing the model class and XML a little more complex, since a pair of files (the model class and the deserializer class) needs to be modified with each change. This may be worth it just to get the XML code out of the model classes.

The decoupling gained from this approach simplifies the model classes and allows you more flexibility with future input and output, for example using something other than XML to store and transmit the objects. It also cordons off the XML-specific code into its own set of files.

The lowest coupling, most complex approach I'd consider is similar to the deserializer/mapper approach just mentioned, but with the mapping details abstracted out in a more declarative manner -- similar to your second approach. I've seen this approach used in luabind and other "C++ to scripting language" mappings.

void DocumentDeserializer::configureDeserializer() { 
    //XMLMapping<T> is a templated mapping class that 
    //maps an element name to a field of T and deserializer function.
    XMLMapping<Document>::registerElementMapping("track", &Document::tracks, &TrackDeserializer::fromXML); 

    //Example of registering a new element that doesn't need a special deserializer.
    XMLMapping<Document>::registerElementMapping("name", &Document::name);

}

Document* DocumentDeserializer::fromXML(SomeXMLStream* stream) {
    Document *doc = new Document();

    //Allow the mapper to handle the details.
    XMLMapping<Document>::map(stream, doc);
    return doc;
 }

//similar code for Track and Clip

The coupling between XML and model class is still there in the code, but now it's declared in one place (configureDeserializer) and executed in another (fromXML). This separation simplifies adding new elements later since it's now a matter of adding one line to the end of the list of mappings.

The downside is that unknown quantity, the XMLMapping<T> class: how much complexity must it handle? Should it handle getter and setter methods or talk directly to fields? How would it handle string values that have special formatting, like dates? What if two elements need to be read to populate one field, or one element that populates two fields? As convenient as the mapping approach could be, it might take a long, painful time just to make it work, and cases that are simple to code in the first two approaches could be very difficult to turn into mappings in this approach.

So those are the three approaches I would consider. There are lots of alternatives you could come up with based on these (for example, use a scripting language like Lua to manage the mapping in the second approach) and I'm sure there are approaches out there that I hadn't considered, but I hope this still gives you something to think about, and that you're ultimately able to find a solution that you're comfortable with.

Loading OOP structure from XML--how to design code

Tags:

oop

xml

tacospice

1 Answers

user1201210

Recent Activity

Donate For Us

Loading OOP structure from XML--how to design code

Tags:

oop

xml

tacospice

1 Answers

user1201210

Related questions

Recent Activity

Donate For Us