Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Implement Word Level in google / diff-match-patch C#

I am trying to implement word level matches in Google Diff Match Patch, but it is beating me up.

The result I get is:

 =I've never been =|-a-|=t=|= th=|-e-|=se places=|
 =I've never been =|=t=|+o+|= th=|+o+|=se places=|

The result I want is:

 =I've never been =|-at these-|= places=|
 =I've never been =|+to those+|= places=|

The documentation says:

make a copy of diff_linesToChars and call it diff_linesToWords. Look for the line that identifies the next line boundary: lineEnd = text.indexOf('\n', lineStart);

In the c# version, I found the line to change in diff_linesToCharsMunge, which I changed to:

lineEnd = text.Replace(@"/[\n\.,;:]/ g"," ").IndexOf(" ", lineStart);

However, there is no change in granularity -it still finds differences at character level.

I am calling:

List<Diff> differences = diffs.diff_main(linepair.Original, linepair.Corrected, true);
diffs.diff_cleanupSemantic(differences); 

I have stepped through to make sure that it is hitting the change I made (incidently, there is a hardcoded minimum of 100 characters before it kicks in).

like image 424
Alex Russell Avatar asked Nov 21 '25 05:11

Alex Russell


1 Answers

I have created a sample dotnet project with diffmatch program. Its probably older version of DiffMatchPatch file but the word and lines work.

DiffMatchPatchSample

For your above sample text ,I get below output.

at these | to those

like image 82
Niketh Sudhakaran Avatar answered Nov 25 '25 00:11

Niketh Sudhakaran