Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to preserve XML nodes that are not bound to an object when using SAX for parsing

I am working on an android app which interfaces with a bluetooth camera. For each clip stored on the camera we store some fields about the clip (some of which the user can change) in an XML file.

Currently this app is the only app writing this xml data to the device but in the future it is possible a desktop app or an iphone app may write data here too. I don't want to make an assumption that another app couldn't have additional fields as well (especially if they had a newer version of the app which added new fields this version didn't support yet).

So what I want to prevent is a situation where we add new fields to this XML file in another application, and then the user goes to use the android app and its wipes out those other fields because it doesn't know about them.

So lets take hypothetical example:

<data>
  <title>My Title</title>
  <date>12/24/2012</date>
  <category>Blah</category>
</data>

When read from the device this would get translated to a Clip object that looks like this (simplified for brevity)

public class Clip {
  public String title, category;
  public Date date;
}

So I'm using SAX to parse the data and store it to a Clip. I simply store the characters in StringBuilder and write them out when I reach the end element for title,category and date.

I realized though that when I write this data back to the device, if there were any other tags in the original document they would not get written because I only write out the fields I know about.

This makes me think that maybe SAX is the wrong option and perhaps I should use DOM or something else where I could more easily write out any other elements that existed originally.

Alternatively I was thinking maybe my Clip class contains an ArrayList of some generic XML type (maybe DOM), and in startTag I check if the element is not one of the predefined tags, and if so, until I reach the end of that tag I store the whole structure (but in what?).. Then upon writing back out I would just go through all of the additional tags and write them out to the xml file (along with the fields I know about of course)

Is this a common problem with a good known solution?

-- Update 5/22/12 --

I didn't mention that in the actual xml the root node (Actually called annotation), we use a version number which has been set to 1. What I'm going to do for the short term is require that the version number my app supports is >= what the version number is of the xml data. If the xml is a greater number I will attempt to parse for reading back but will deny any saves to the model. I'm still interested in any kind of working example though on how to do this.

BTW I thought of another solution that should be pretty easy. I figure I can use XPATH to find nodes that I know about and replace the content for those nodes when the data is updated. However I ran some benchmarks and the overhead is absurd in parsing the xml when it is parsed into memory. Just the parsing operation without even doing any lookups resulted in performance being 20 times worse than SAX.. Using xpath was between 30-50 times slower in general for parsing, which was really bad considering I parse these in a list view. So my idea is to keep the SAX to parse the nodes to clips, but store the entirety of the XML in an variable of the Clip class (remember, this xml is short, less than 2kb). Then when I go to write the data back out I could use XPATH to replace out the nodes that I know about in the original XML.

Still interested in any other solutions though. I probably won't accept a solution though unless it includes some code examples.

like image 868
Matt Wolfe Avatar asked May 18 '12 07:05

Matt Wolfe


2 Answers

Here's how you can go about it with SAX filters:

  1. When you read your document with SAX you record all the events. You record them and bubble them up further to the next level of SAX reader. You basically stack together two layers of SAX readers (with XMLFilter) - one will record and relay, and the other one is your current SAX handler that creates objects.
  2. When you're ready to write your modifications back to disk you fire up the recorded SAX events layered with your writer that would overwrite those values/nodes you have altered.

I spent some time with the idea and it worked. It basically came down to proper chaining of XMLFilters. Here's how the unit test looks like, your code would do something similar:

final SAXParserFactory factory = SAXParserFactory.newInstance();
final SAXParser parser = factory.newSAXParser();

final RecorderProxy recorder = new RecorderProxy(parser.getXMLReader());
final ClipHolder clipHolder = new ClipHolder(recorder);

clipHolder.parse(new InputSource(new StringReader(srcXml)));

assertTrue(recorder.hasRecordingToReplay());

final Clip clip = clipHolder.getClip();
assertNotNull(clip);
assertEquals(clip.title, "My Title");
assertEquals(clip.category, "Blah!");
assertEquals(clip.date, Clip.DATE_FORMAT.parse("12/24/2012"));

clip.title = "My Title Updated";
clip.category = "Something else";

final ClipSerializer serializer = new ClipSerializer(recorder);
serializer.setClip(clip);

final TransformerFactory xsltFactory = TransformerFactory.newInstance();
final Transformer t = xsltFactory.newTransformer();
final StringWriter outXmlBuffer = new StringWriter();

t.transform(new SAXSource(serializer, 
            new InputSource()), new StreamResult(outXmlBuffer));

assertEquals(targetXml, outXmlBuffer.getBuffer().toString());

The important lines are:

  • your SAX events recorder is wrapped around the SAX parser
  • your Clip parser (ClipHolder) is wrapped around the recorder
  • when the XML is parsed, recorder will record everything and your ClipHolder will only look at what it knows about
  • you then do whatever you need to do with the clip object
  • the serializer is then wrapped around the recorder (basically re-mapping it onto itself)
  • you then work with the serializer and it will take care of feeding the recorded events (delegating to the parent and registering self as a ContentHandler) overlayed with what it has to say about the clip object.

Please find the DVR code and the Clip test over at github. I hope it helps.

p.s. it's not a generic solution and the whole record->replay+overlay concept is very rudimentary in the provided implementation. An illustration basically. If your XML is more complex and gets "hairy" (e.g. same element names on different levels, etc.) then the logic will need to be augmented. The concept will remain the same though.

like image 105
Pavel Veller Avatar answered Oct 04 '22 22:10

Pavel Veller


You're right to say that SAX is probably not the best option if you want to keep the nodes that you've not "consumed". You could still do it using some kind of "sax store" that would keep the SAX events and replay them (there are some few implementations of such a thing around), but an object model based API would be much easier to use: you'd easily keep the complete object model and just update "your" nodes.

Of course, you can use DOM which is the standard, but you may also want to consider alternatives which provide an easier access to the specific nodes that you'll be using in an arbitrary data model. Among them, JDOM (http://www.jdom.org/) and XOM (http://www.xom.nu/) are interesting candidates.

like image 23
Eric van der Vlist Avatar answered Oct 04 '22 22:10

Eric van der Vlist