Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stackoverflow's text diff [duplicate]

Tags:

algorithm

diff

When I go to (for example) review section here, I see these beautifully formatted text diffs:

text diff

Or the combined one:

combined text diff

Now, just how the hell is this done? I'd love to include this to my site, but I can't really figure out the algorithm. Is this documented somewhere? Is there an open-source implementation for this, preferably in PHP?

Thanks for any help.

like image 682
cypher Avatar asked Apr 17 '26 17:04

cypher


1 Answers

From the images that you posted, and my own (albeit little experience) it seems that the website uses a modification of the longest common sub sequence algorithm. This explains why it never shows rearrangement / shuffling of words.

The first modification is that instead of thinking of alphabets as atomic units, they consider words as atomic units. (also punctuation)

Secondly, the algorithm is relatively naive, it points out that you crossed out "work" when you actually just inserted a to there. It seems to just mark discontinuities of any kind (insertions, deletions, modifications) and crosses out one word or the whole discontinuation portion.

Thirdly, everything in the second list not a part of the first list is marked in green.

Seems relatively easy to implement. Check out some tutorial on dynamic programming.

like image 139
shebang Avatar answered Apr 21 '26 02:04

shebang



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!