I'm looking for an efficient way to compare and get differences between two XML-based parse trees.
What would you suggest to be the best way to store those differences? I would have done this:
XML A:
<w:p>
<w:pPr>
<w:spacing w:after="1"/>
</w:pPr>
<w:r>
<w:t>World</w:t>
</w:r>
</w:p>
XML B:
<w:p>
<w:pPr>
<w:spacing w:after="1"/>
</w:pPr>
<w:r>
<w:t>ASDF</w:t>
</w:r>
</w:p>
The algorithm determines that "World" has changed to "ASDF" and then stores:
div: <w:p><w:r><w:t>World</w:t> -> <w:p><w:r><w:t>ASDF</w:t>
Is this enough to cover all cases that might occur?
Does anybody know of a good way to do that? Any help would really be appreciated!
It might get harder. Look at this example:
<w:p>
<w:pPr>
<w:spacing w:after="1"/>
</w:pPr>
<w:r>
<w:t>World</w:t> <-- Case 1: this changes to <w:t>ASDF</w:t>
<w:t>World</w:t> <-- Case 2: this changes to <w:t>ASDF</w:t>
</w:r>
</w:p>
To be able to recognize both cases, you'll have to store one as
div: <w:p><w:r><w:t>World</w:t> -> <w:p><w:r><w:t>ASDF</w:t>
and the other as
div: <w:p><w:r><w:t>World</w:t><w:t>World</w:t> -> <w:p><w:r><w:t>World</w:t><w:t>ASDF</w:t>
or something similar (you might also want to add "w:p" closing tags to both of them to make them valid XML sub-trees).
In general, such programs can get very complicated, so I wouldn't recommend you to create something completely new but to either use some existing diff algorithm (most will be good enough even without parsing the XML structure) or to modify one of them to suit your needs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With