Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML Diff and Merge

I think i have a rather unique problem to solve. Well, i cant find enough information using Google. So here it goes,

I work on a Java EE SOA application which stores XML documents as XML using Oracle XML DB. Whenever the XML changes, i increment the version and throw the previous version into a different table.

The requirement now is, I should store the differences between 2 versions as XML, instead of the whole XML document.

  1. Is there any Java library which can do XML comparison? (XMLUnit, ... ?)
  2. Is there a standard XML Schema for capturing XML differences?
  3. What transformation technology can i use to apply the "differences" to an XML to go back and forth between versions? (XSLT, Groovy,.... ?)

I appreciate your time.

like image 357
user53552 Avatar asked Jan 09 '09 22:01

user53552


People also ask

What is XML merge?

The xmlmerge command collects XML snippets scattered to multiple files and merges them into one big XML tree. This tool is used by the build process of AqBanking to merge HBCI segment definitions from several files into one big XML file.

Can we merge two XML files?

To use this, create a new XSLT file (File > New > XSLT Stylesheet and place in it the stylesheet above. Save the file as "merge. xsl". You should also add the files (or folder) to an Oxygen project (Project view) and create a scenario of the "XML transformation with XSLT" type for one XML file.

How can I merge two XML files online?

To add files click anywhere in the blue area or on the Browse for file button to upload or drag and drop them. You can also add the documents by entering their URL in the URL cell. Click on the Merge button. Your MPP file will be uploaded and combined to the result format.


1 Answers

In my last job, we had a similar problem: We had to detect changes, insertions, and deletions of specific items between two XML files. The files weren't arbitrary XML; they had to adhere to our XSD.

Our solution was to implement a kind of merge sort: Parse the files (using a SAX parser, not a DOM parser, to permit arbitrarily large files), and store the parsed data in separate HashMaps. Then, we compared the contents of the two maps using a merge-sort type of algorithm.

Naturally, the larger the files got, the more memory pressure we experienced, so I ultimately wrote a FileHashMap class that pushed the HashMap's value space to random access files. While theoretically slower, this solution allowed our comparisons to work with very large files, without thrashing or OutOfMemoryError conditions. (A version of that FileHashMap class is available in this library: http://www.clapper.org/software/java/util/)

I have no idea whether what I just described is even remotely close to what you need, but I thought I'd share it, just in case.

Good luck.

like image 178
Brian Clapper Avatar answered Sep 20 '22 20:09

Brian Clapper