Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best (word or character)-based diff algorithm out there?

So, I want to be able to find the diff between two strings on a per-word basis (maybe faster than per-character, though, if per-character is faster then I'd want to do it that way).

Here is an example of what I want to achieve: Source Text:

Hello there!

Modified Text:

Helay scere?

diff:

Hel[lo](ay) [th](sc)ere[!](?)
  • the bracketed text is what was removed, the parenthetical text is what was added

there is kind of a super hackish way to do this using a commandline tool, such as opendiff, but it requires a newline character inbetween every character, as opendiff is line-based.

I'm using ruby, and haven't found any tools to do this... but language isn't terribly important, as algorithms can be ported pretty easily.

thanks.

like image 532
NullVoxPopuli Avatar asked Dec 05 '11 20:12

NullVoxPopuli


2 Answers

You may want to check this: http://en.wikipedia.org/wiki/Longest_common_subsequence_problem. It's not hard to implement.

like image 171
Victor Moroz Avatar answered Nov 16 '22 02:11

Victor Moroz


Have a look to https://github.com/pvande/differ. This gem does what you are looking for

like image 42
alex Avatar answered Nov 16 '22 03:11

alex