Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Anyone have a diff algorithm for rendered HTML? [closed]

I'm interested in seeing a good diff algorithm, possibly in Javascript, for rendering a side-by-side diff of two HTML pages. The idea would be that the diff would show the differences of the rendered HTML.

To clarify, I want to be able to see the side-by-side diffs as rendered output. So if I delete a paragraph, the side by side view would know to space things correctly.


@Josh exactly. Though maybe it would show the deleted text in red or something. The idea is that if I use a WYSIWYG editor for my HTML content, I don't want to have to switch to HTML to do diffs. I want to do it with two WYSIWYG editors side by side maybe. Or at least display diffs side-by-side in an end-user friendly matter.

like image 281
Haacked Avatar asked Aug 28 '08 06:08

Haacked


People also ask

What algorithm does diff use?

Myers Algorithm – human readable diffs This is used by tools such as Git Diff and GNU Diff. Original Myers time and space complexity is O(ND) where N is the sum of the lengths of both inputs and D is the size of the minimum edit script that converts one input to the other.

How is diff implemented?

The diff command is invoked from the command line, passing it the names of two files: diff original new . The output of the command represents the changes required to transform the original file into the new file. If original and new are directories, then diff will be run on each file that exists in both directories.


2 Answers

There's another nice trick you can use to significantly improve the look of a rendered HTML diff. Although this doesn't fully solve the initial problem, it will make a significant difference in the appearance of your rendered HTML diffs.

Side-by-side rendered HTML will make it very difficult for your diff to line up vertically. Vertical alignment is crucial for comparing side-by-side diffs. In order to improve the vertical alignment of a side-by-side diff, you can insert invisible HTML elements in each version of the diff at "checkpoints" where the diff should be vertically aligned. Then you can use a bit of client-side JavaScript to add vertical spacing around checkpoint until the sides line up vertically.

Explained in a little more detail:

If you want to use this technique, run your diff algorithm and insert a bunch of visibility:hidden <span>s or tiny <div>s wherever your side-by-side versions should match up, according to the diff. Then run JavaScript that finds each checkpoint (and its side-by-side neighbor) and adds vertical spacing to the checkpoint that is higher-up (shallower) on the page. Now your rendered HTML diff will be vertically aligned up to that checkpoint, and you can continue repairing vertical alignment down the rest of your side-by-side page.

like image 114
kamens Avatar answered Oct 27 '22 00:10

kamens


Over the weekend I posted a new project on codeplex that implements an HTML diff algorithm in C#. The original algorithm was written in Ruby. I understand you were looking for a JavaScript implementation, perhaps having one available in C# with source code could assist you to port the algorithm. Here is the link if you are interested: htmldiff.codeplex.com. You can read more about it here.

UPDATE: This library has been moved to GitHub.

like image 29
Rohland Avatar answered Oct 26 '22 23:10

Rohland