Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding difference between strings in Javascript

I'd like to compare two strings (a before and after) and detect exactly where and what changed between them.

For any change, I want to know:

  1. Starting position of the change (inclusive, starting at 0)
  2. Ending position of the change (inclusive, starting at 0) relative to the previous text
  3. The "change"

Assume that strings will change in only one place at a time (for example, never "Bill" -> "Kiln").

Additionally, I need the start and end positions to reflect the type of change:

  • If deletion, the start and end position should be the start and end positions of the deleted text, respectively
  • If replacement, the start and end position should be the start and end positions of the "deleted" text, respectively (the change will be the "added" text)
  • If insertion, the start and end positions should be the same; the entry point of the text
  • If no change, let start and end positions remain zero, with an empty change

For example:

"0123456789" -> "03456789"  
Start: 1, End: 2, Change: "" (deletion)

"03456789" -> "0123456789"  
Start: 1, End: 1, Change: "12" (insertion)

"Hello World!" -> "Hello Aliens!"  
Start: 6, End: 10, Change: "Aliens" (replacement)

"Hi" -> "Hi"  
Start: 0, End: 0, Change: "" (no change)

I was able to somewhat detect the positions of the changed text, but it doesn't work in all cases because in order to do that accurately, I need to know what kind of change is made.

var OldText = "My edited string!";
var NewText = "My first string!";

var ChangeStart = 0;
var NewChangeEnd = 0;
var OldChangeEnd = 0;
console.log("Comparing start:");
for (var i = 0; i < NewText.length; i++) {
    console.log(i + ": " + NewText[i] + " -> " + OldText[i]);
    if (NewText[i] != OldText[i]) {
        ChangeStart = i;
        break;
    }
}
console.log("Comparing end:");
// "Addition"?
if (NewText.length > OldText.length) {
    for (var i = 1; i < NewText.length; i++) {
        console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1));
        if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) {
            NewChangeEnd = NewText.length - i;
            OldChangeEnd = OldText.length - i;
            break;
        }
    }
// "Deletion"?
} else if (NewText.length < OldText.length) {
    for (var i = 1; i < OldText.length; i++) {
        console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1));
        if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) {
            NewChangeEnd = NewText.length - i;
            OldChangeEnd = OldText.length - i;
            break;
        }
    }
// Same length...
} else {
    // Do something
}
console.log("Change start: " + ChangeStart);
console.log("NChange end : " + NewChangeEnd);
console.log("OChange end : " + OldChangeEnd);
console.log("Change: " + OldText.substring(ChangeStart, OldChangeEnd + 1));

How do I tell whether or not an insertion, deletion, or replacement took place?


I've searched and came up with a few other similar questions, but they don't seem to help.

like image 300
Richard Avatar asked Nov 11 '14 04:11

Richard


People also ask

How do you find the difference between strings?

StringUtils. difference() returns the difference between two strings, returning the portion of the second string, which starts to differ from the first. StringUtils. indexOfDifference() returns the index at which the second string starts to diverge from the first.

How do you find the difference between two strings in TypeScript?

Use the strict equality operator (===) to check if two strings are equal in TypeScript, e.g. if (str1 === str2) {} . The strict equality operator returns true if the strings are equal, otherwise false is returned.

How do you check if a string is in another string JavaScript?

The includes() method returns true if a string contains a specified string. Otherwise it returns false . The includes() method is case sensitive.

Is it safe to compare JavaScript strings?

Summary. You're safe to compare strings directly when their characters are from the Basic Multilingual Plane. However, if the strings can contain combining characters, then it would be safer to normalize the compared strings to the same form using string. normalize() function.


2 Answers

I have gone through your code and your logic for matching string makes sense to me. It logs ChangeStart, NewChangeEnd and OldChangeEnd correctly and the algorithm flows alright. You just want to know if an insertion, deletion or replacement took place. Here's how I would go about it.

First of all, you need to make sure that after you have got the first point of mis-match i.e. ChangeStart when you then traverse the strings from the end, the index shouldn't cross ChangeStart.

I'll give you an example. Consider the following strings:

 var NewText = "Hello Worllolds!";
 var OldText = "Hello Worlds!";

 ChangeStart -> 10 //Makes sense
 OldChangeEnd -> 8
 NewChangeEnd -> 11

 console.log("Change: " + NewText.substring(ChangeStart, NewChangeEnd + 1)); 
 //Ouputs "lo"

The problem in this case is when it starts matching from the back, the flow is something like this:

 Comparing end: 
  1(N: 12 O: 12: ! -> !) 
  2(N: 11 O: 11: s -> s) 
  3(N: 10 O: 10: d -> d)  -> You need to stop here!

 //Although there is not a mismatch, but we have reached ChangeStart and 
 //we have already established that characters from 0 -> ChangeStart-1 match
 //That is why it outputs "lo" instead of "lol"

Assuming, what I just said makes sense, you just need to modify your for loops like so:

 if (NewText.length > OldText.length) {
 for (var i = 1; i < NewText.length && ((OldText.length-i)>=ChangeStart); i++) {
  ...

    NewChangeEnd = NewText.length - i -1;
    OldChangeEnd = OldText.length - i -1;
  if(//Mismatch condition reached){
         //break..That code is fine.
    }
 }

This condition -> (OldText.length-i)>=ChangeStart takes care of the anomaly that I mentioned and therefore the for loop automatically terminates if this condition is reached. However, just as I mentioned there might be situations where this condition is reached before a mis-match is encountered like I just demonstrated. So you need to update values of NewChangeEnd and OldChangeEnd as 1 less than the matched value. In case of a mis-match, you store the values appropriately.

Instead of an else -if we could just wrap those two conditions in a situation where we know NewText.length > OldText.length is definitely not true i.e. it is either a replacement or a deletion. Again NewText.length > OldText.length also means it could be a replacement or an insertion as per your examples, which makes sense. So the else could be something like:

else {
for (var i = 1; i < OldText.length && ((OldText.length-i)>=ChangeStart); i++) { 

    ...
    NewChangeEnd = NewText.length - i -1;
    OldChangeEnd = OldText.length - i -1;
  if(//Mismatch condition reached){
         //break..That code is fine.
    }
 }

If you have understood the minor changes thus far, identifying the specific cases is really simple:

  1. Deletion - Condition -> ChangeStart > NewChangeEnd. Deleted string from ChangeStart -> OldChangeEnd.

Deleted text -> OldText.substring(ChangeStart, OldChangeEnd + 1);

  1. Insertion - Condition -> ChangeStart > OldChangeEnd. Inserted string at ChangeStart.

Inserted text -> NewText.substring(ChangeStart, NewChangeEnd + 1);

  1. Replacement - If NewText != OldText and the above two conditions are not met, then it is a replacement.

Text in old string that got replaced -> OldText.substring(ChangeStart, OldChangeEnd + 1);

The replacement text -> NewText.substring(ChangeStart, NewChangeEnd + 1);

Start and end positions in the OldText that got replaced -> ChangeStart -> OldChangeEnd

I have created a jsfiddle incorporating the changes that I have mentioned in your code. You might want to check it out. Hope it gets you started in the right direction.

like image 174
Vivek Pradhan Avatar answered Oct 04 '22 16:10

Vivek Pradhan


Made my own slightly more performant version inspired by the same tactics as above (looking for differences from front to back and back to front)

function compareText(oldText, newText)
{
    var difStart,difEndOld,difEndNew;

    //from left to right - look up the first index where characters are different
    for(let i=0;i<oldText.length;i++)
    {
        if(oldText.charAt(i) !== newText.charAt(i))
        {
            difStart = i;
            break;
        }
    }

    //from right to left - look up the first index where characters are different
    //first calc the last indices for both strings
    var oldMax = oldText.length - 1;
    var newMax = newText.length - 1;
    for(let i=0;i<oldText.length;i++)
    {
        if(oldText.charAt(oldMax-i) !== newText.charAt(newMax-i))
        {
            //with different string lengths, the index will differ for the old and the new text
            difEndOld = oldMax-i;
            difEndNew = newMax-i;
            break;
        }
    }

    var removed = oldText.substr(difStart,difEndOld-difStart+1);
    var added = newText.substr(difStart,difEndNew-difStart+1);

    return [difStart,added,removed];
}
like image 32
Flion Avatar answered Oct 04 '22 17:10

Flion