What is an Algorithm to Diff the Two Strings in the Same Way that SO Does on the Version Page?

Tags:

I'm trying to diff two strings by phrase, similar to the way that StackOverflow diffs the two strings on the version edits page. What would be an algorithm to do this? Are there gems, or other standard libraries that accomplish this?

EDIT: I've seen other diffing algorithms (Differ with Ruby) and they seem to result in the following:

>> o = 'now is the time when all good men.'
>> p = 'now some time the men time when all good men.'
>> Differ.diff_by_word(o,p).format_as(:html)
=> "now <del class=\"differ\">some</del><ins class=\"differ\">is</ins> 
   <del class=\"differ\">time </del>the <del class=\"differ\">men </del>time
   when all good men."

Note how the words are diffed on a per word basis? I'd like some way of diffing more by phrase, so the above code output:

=> "now <del class=\"differ\">some time the men</del><ins class=\"differ\">is
   the</ins> time when all good men."

Am I hoping for too much?

607

asked Sep 03 '09 04:09

aronchick

1 Answers

The algorithm you are looking for is Longest Common Subsequence it does most of the work for you.

The outline is something along these lines.

Split by word (input, output)
Calculate LCS on input / output array.
Walk through the array and join up areas intelligently.

So for example say you have:

"hello world this is a test"

compared with:

"mister hello world"

The result from the LCS is

"mister" +
"hello" =
"world" =
"this" -
"is" -
"a" -
"test" -

Now you sprinkle the special sauce when building up. You join the string together while staying mindful of the previous action. The naive algorithm is just join sections that are the same action.

"mister" +
"hello world" =
"this is a test" -

Finally you transform it to html:

<ins>mister</ins> hello world <del>this is a test</del>

Of course the devil is in the detail:

You need to consider how you handle tags
Do you compare markdown or html
Are there any edge cases where the UI stops making sense.
Do you need special handling for punctuations.

answered Sep 19 '22 05:09

Sam Saffron

Related questions
                            
                                Datetime timezone adjustments
                            
                                Create MSBuild custom task to modify C# code *before* compile
                            
                                Windbg help -> how can I read the code at this callstack?
                            
                                GridView RowDataBound doesn't fire on postback
                            
                                Implementing Audit Trail for Objects in C#?
                            
                                Linq to Xml: selecting elements if an attribute value equals a node value in an IEnumerable<XElement>
                            
                                How to diff two versions of same object?
                            
                                How to use Marshal.ReleaseComObject with Win32 native functions
                            
                                If I implement ISerializable in a child class, does the parent have to as well?
                            
                                LINQ options.loadwith problem
                            
                                "AsyncFuture<T>" or what? Future<T> obtained in a background thread -- is it a pattern?
                            
                                LinqToSql static DataContext in a web application
                            
                                Dynamic Business Rules in a web application
                            
                                Stream Reuse in C#
                            
                                NHibernate.LazyInitializationException
                            
                                Concurrent file write
                            
                                How do I use PerformanceCounterType AverageTimer32?
                            
                                Marshaling pointer to an array of strings
                            
                                How to check if Windows user account name exists in domain?
                            
                                Ebay Finding API and C# [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is an Algorithm to Diff the Two Strings in the Same Way that SO Does on the Version Page?

Tags:

c#

algorithm

ruby

aronchick

People also ask

1 Answers

Sam Saffron

Recent Activity

Donate For Us