Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML Versioning Algorithm

I'm looking for an efficient way to compare and get differences between two XML-based parse trees.

What would you suggest to be the best way to store those differences? I would have done this:

XML A:

<w:p>
  <w:pPr>
    <w:spacing w:after="1"/>
  </w:pPr>
  <w:r>
    <w:t>World</w:t>
  </w:r>
</w:p>

XML B:

<w:p>
  <w:pPr>
    <w:spacing w:after="1"/>
  </w:pPr>
  <w:r>
    <w:t>ASDF</w:t>
  </w:r>
</w:p>

The algorithm determines that "World" has changed to "ASDF" and then stores:

div: <w:p><w:r><w:t>World</w:t> -> <w:p><w:r><w:t>ASDF</w:t>

Is this enough to cover all cases that might occur?

Does anybody know of a good way to do that? Any help would really be appreciated!

like image 620
Chris Avatar asked Nov 06 '22 21:11

Chris


1 Answers

It might get harder. Look at this example:

<w:p>
  <w:pPr>
    <w:spacing w:after="1"/>
  </w:pPr>
  <w:r>
    <w:t>World</w:t> <-- Case 1: this changes to <w:t>ASDF</w:t>
    <w:t>World</w:t> <-- Case 2: this changes to <w:t>ASDF</w:t>
  </w:r>
</w:p>

To be able to recognize both cases, you'll have to store one as

 div: <w:p><w:r><w:t>World</w:t> -> <w:p><w:r><w:t>ASDF</w:t>

and the other as

 div: <w:p><w:r><w:t>World</w:t><w:t>World</w:t> -> <w:p><w:r><w:t>World</w:t><w:t>ASDF</w:t>

or something similar (you might also want to add "w:p" closing tags to both of them to make them valid XML sub-trees).

In general, such programs can get very complicated, so I wouldn't recommend you to create something completely new but to either use some existing diff algorithm (most will be good enough even without parsing the XML structure) or to modify one of them to suit your needs.

like image 116
schnaader Avatar answered Nov 15 '22 06:11

schnaader