Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text difference algorithm

Tags:

python

c#

diff

I need an algorithm that can compare two text files and highlight their difference and ( even better!) can compute their difference in a meaningful way (like two similar files should have a similarity score higher than two dissimilar files, with the word "similar" defined in the normal terms). It sounds easy to implement, but it's not.

The implementation can be in c# or python.

Thanks.

like image 369
Graviton Avatar asked Sep 28 '08 10:09

Graviton


People also ask

How do you compare two different texts?

When comparing texts, consider both what they have in common and what is different about them. If they have the same purpose: Do they use similar techniques? For example, two newspaper articles could use exaggeration to present completely different viewpoints of the same topic.

How does diff algorithm work?

The core of diff algorithms seeks to compare two sequences and to discover how the first can be transformed into the second by a sequence of operations using the primitives delete-subsequence, and insert-subseqence. If a delete and an insert coincide on the same range then it can be labeled as a change-subsequence.

What is a text comparison?

What is Text Comparison? Text Comparison is the process of inspecting two files to ensure that no unintended changes have occurred. Typically, one of the files is the original, master document while the other is a revision.


1 Answers

I can recommend to take a look at Neil Fraser's code and articles:

google-diff-match-patch

Currently available in Java, JavaScript, C++ and Python. Regardless of language, each library features the same API and the same functionality. All versions also have comprehensive test harnesses.

Neil Fraser: Diff Strategies - for theory and implementation notes

like image 131
aku Avatar answered Sep 23 '22 01:09

aku