Can somebody recommend some papers (literature) or code snippets about tree-based diff algorithms for XML (based on the DOM-tree)
Thank you very much.
In Git, there are four diff algorithms, namely Myers, Minimal, Patience, and Histogram, which are utilized to obtain the differences of the two same files located in two different commits. The Minimal and the Histogram algorithms are the improved versions of the Myers and the Patience respectively.
Typically, diff is used to show the changes between two versions of the same file. Modern implementations also support binary files. The output is called a "diff", or a patch, since the output can be applied with the Unix program patch.
The diff utility is a data comparison tool that calculates and displays the differences between two files. It displays the changes made in a standard format, such that both humans and machines can understand the changes and apply them: given one file and the changes, the other file can be created.
Diff command is used in git to track the difference between the changes made on a file. Since Git is a version control system, tracking changes are something very vital to it. Diff command takes two inputs and reflects the differences between them. It is not necessary that these inputs are files only.
Here is one useful paper on the same : http://pdf.aminer.org/000/301/327/x_diff_an_effective_change_detection_algorithm_for_xml_documents.pdf
Here is one tool you can experiment with: http://www.cs.hut.fi/~ctl/3dm/
You may find the Java source for the above tool as well which maybe of great use.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With