Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

compare text and get differences

Well i want to compare 2 strings (version one and version two) and get the differences in a format that i can convert to html on my own, like you can view how a post was edited here on stackoverflow or like svn tracks differences between revisions....

It must be full managed code library.

Like this JavaScript but i need to do it on the server-side..

like image 204
Peter Avatar asked Jul 18 '11 11:07

Peter


People also ask

What is a text comparison?

Text Comparison is the process of inspecting two files to ensure that no unintended changes have occurred. Typically, one of the files is the original, master document while the other is a revision.

How can I compare text files online?

Open the Text Compare tool and upload a document in each pane. The documents that you are uploading will be compared against each other for similarities. Once the upload process is completed, initiate the comparing process by selecting compare.


2 Answers

Google has something similar and it is available in C#, but have not looked at it any deeper. The demo looks pretty cool though.

http://code.google.com/p/google-diff-match-patch/

like image 144
Remy Avatar answered Oct 03 '22 19:10

Remy


I have a class library that does this, I'll post a link below, but I'll also post how it does its job so that you can evaluate whether it will be fitting for your content.

Note that for everything I say below, if you think of each character as an element of a collection, you can implement the algorithm described below for any type of content. Be it characters of a string, lines of text, collections of ORM-objects.

The whole algorithm revolves around longest-common-substring (LCS), and is a recursive approach.

First the algorithm tries to find the LCS between the two. This will be the longest section that is unchanged/identical between the two versions. The algorithm then considers these two parts to be "aligned".

For instance, here's how two example strings would be aligned:

      This long text has some text in the middle that will be found by LCS
This extra long text has some text in the middle that should be found by LCS
          ^-------- longest common substring --------^

Then it recursively applies itself to the portions before the aligned section, and the portion afterwards.

The final "result" could look like this (I'm using the underscore to indicate portions "not there" in one of the strings):

This ______long text has some text in the middle that ______will be found by LCS
This extra long text has some text in the middle that should____ be found by LCS

Then, as part of the recursive approach, each level of recursive call will return a collection of "operations", which based on whether there's a LCS, or missing portions in either part, will spit out as follows:

  • If LCS, then it is a "copy" operation
  • If missing from first, then it is a "insert" operation
  • If missing from second, then it is a "delete" operation

So the above text would be:

  1. Copy 5 characters (This)
  2. Insert extra_ (apparently code-blocks here remove space, the underscore is a space)
  3. Copy 43 characters (long text has some text in the middle that_)
  4. Insert should
  5. Delete 4 characters (will)
  6. Copy 16 characters (_be found by LCS)

The core of the algorithm is quite simple, and with the above text, you should be able to implement it yourself, if you want to.

There are some extra features in my class library, in particular to handle such things as content that is similar to the changed text, so that you don't just get delete or insert operations, but also modify operations, this will mostly be important if you're comparing a list of something, like lines from text files.

The class library can be found here: DiffLib on GitHub, and you will also find it on Nuget for easy installation in Visual Studio 2010. It is written in C# for .NET 3.5 and up, so it will work for .NET 3.5 and 4.0, and since it is a binary release (all source code is on GitHub though), you can use it from VB.NET as well.

like image 26
Lasse V. Karlsen Avatar answered Oct 03 '22 18:10

Lasse V. Karlsen