I'd like to compare two strings (a before and after) and detect exactly where and what changed between them.
For any change, I want to know:
Assume that strings will change in only one place at a time (for example, never "Bill" -> "Kiln").
Additionally, I need the start and end positions to reflect the type of change:
For example:
"0123456789" -> "03456789"
Start: 1, End: 2, Change: "" (deletion)
"03456789" -> "0123456789"
Start: 1, End: 1, Change: "12" (insertion)
"Hello World!" -> "Hello Aliens!"
Start: 6, End: 10, Change: "Aliens" (replacement)
"Hi" -> "Hi"
Start: 0, End: 0, Change: "" (no change)
I was able to somewhat detect the positions of the changed text, but it doesn't work in all cases because in order to do that accurately, I need to know what kind of change is made.
var OldText = "My edited string!";
var NewText = "My first string!";
var ChangeStart = 0;
var NewChangeEnd = 0;
var OldChangeEnd = 0;
console.log("Comparing start:");
for (var i = 0; i < NewText.length; i++) {
console.log(i + ": " + NewText[i] + " -> " + OldText[i]);
if (NewText[i] != OldText[i]) {
ChangeStart = i;
break;
}
}
console.log("Comparing end:");
// "Addition"?
if (NewText.length > OldText.length) {
for (var i = 1; i < NewText.length; i++) {
console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1));
if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) {
NewChangeEnd = NewText.length - i;
OldChangeEnd = OldText.length - i;
break;
}
}
// "Deletion"?
} else if (NewText.length < OldText.length) {
for (var i = 1; i < OldText.length; i++) {
console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1));
if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) {
NewChangeEnd = NewText.length - i;
OldChangeEnd = OldText.length - i;
break;
}
}
// Same length...
} else {
// Do something
}
console.log("Change start: " + ChangeStart);
console.log("NChange end : " + NewChangeEnd);
console.log("OChange end : " + OldChangeEnd);
console.log("Change: " + OldText.substring(ChangeStart, OldChangeEnd + 1));
How do I tell whether or not an insertion, deletion, or replacement took place?
I've searched and came up with a few other similar questions, but they don't seem to help.
StringUtils. difference() returns the difference between two strings, returning the portion of the second string, which starts to differ from the first. StringUtils. indexOfDifference() returns the index at which the second string starts to diverge from the first.
Use the strict equality operator (===) to check if two strings are equal in TypeScript, e.g. if (str1 === str2) {} . The strict equality operator returns true if the strings are equal, otherwise false is returned.
The includes() method returns true if a string contains a specified string. Otherwise it returns false . The includes() method is case sensitive.
Summary. You're safe to compare strings directly when their characters are from the Basic Multilingual Plane. However, if the strings can contain combining characters, then it would be safer to normalize the compared strings to the same form using string. normalize() function.
I have gone through your code and your logic for matching string makes sense to me. It logs ChangeStart
, NewChangeEnd
and OldChangeEnd
correctly and the algorithm flows alright. You just want to know if an insertion, deletion or replacement took place. Here's how I would go about it.
First of all, you need to make sure that after you have got the first point of mis-match i.e. ChangeStart
when you then traverse the strings from the end, the index shouldn't cross ChangeStart
.
I'll give you an example. Consider the following strings:
var NewText = "Hello Worllolds!";
var OldText = "Hello Worlds!";
ChangeStart -> 10 //Makes sense
OldChangeEnd -> 8
NewChangeEnd -> 11
console.log("Change: " + NewText.substring(ChangeStart, NewChangeEnd + 1));
//Ouputs "lo"
The problem in this case is when it starts matching from the back, the flow is something like this:
Comparing end:
1(N: 12 O: 12: ! -> !)
2(N: 11 O: 11: s -> s)
3(N: 10 O: 10: d -> d) -> You need to stop here!
//Although there is not a mismatch, but we have reached ChangeStart and
//we have already established that characters from 0 -> ChangeStart-1 match
//That is why it outputs "lo" instead of "lol"
Assuming, what I just said makes sense, you just need to modify your for
loops like so:
if (NewText.length > OldText.length) {
for (var i = 1; i < NewText.length && ((OldText.length-i)>=ChangeStart); i++) {
...
NewChangeEnd = NewText.length - i -1;
OldChangeEnd = OldText.length - i -1;
if(//Mismatch condition reached){
//break..That code is fine.
}
}
This condition -> (OldText.length-i)>=ChangeStart
takes care of the anomaly that I mentioned and therefore the for
loop automatically terminates if this condition is reached. However, just as I mentioned there might be situations where this condition is reached before a mis-match is encountered like I just demonstrated. So you need to update values of NewChangeEnd
and OldChangeEnd
as 1 less than the matched value. In case of a mis-match, you store the values appropriately.
Instead of an else -if
we could just wrap those two conditions in a situation where we know NewText.length > OldText.length
is definitely not true i.e. it is either a replacement or a deletion. Again NewText.length > OldText.length
also means it could be a replacement or an insertion as per your examples, which makes sense. So the else
could be something like:
else {
for (var i = 1; i < OldText.length && ((OldText.length-i)>=ChangeStart); i++) {
...
NewChangeEnd = NewText.length - i -1;
OldChangeEnd = OldText.length - i -1;
if(//Mismatch condition reached){
//break..That code is fine.
}
}
If you have understood the minor changes thus far, identifying the specific cases is really simple:
ChangeStart > NewChangeEnd
. Deleted string from ChangeStart -> OldChangeEnd
.Deleted text -> OldText.substring(ChangeStart, OldChangeEnd + 1);
ChangeStart > OldChangeEnd
. Inserted string at ChangeStart
.Inserted text -> NewText.substring(ChangeStart, NewChangeEnd + 1);
NewText != OldText
and the above two conditions are not met, then it is a replacement. Text in old string that got replaced -> OldText.substring(ChangeStart, OldChangeEnd + 1);
The replacement text -> NewText.substring(ChangeStart, NewChangeEnd + 1);
Start and end positions in the OldText
that got replaced -> ChangeStart -> OldChangeEnd
I have created a jsfiddle incorporating the changes that I have mentioned in your code. You might want to check it out. Hope it gets you started in the right direction.
Made my own slightly more performant version inspired by the same tactics as above (looking for differences from front to back and back to front)
function compareText(oldText, newText)
{
var difStart,difEndOld,difEndNew;
//from left to right - look up the first index where characters are different
for(let i=0;i<oldText.length;i++)
{
if(oldText.charAt(i) !== newText.charAt(i))
{
difStart = i;
break;
}
}
//from right to left - look up the first index where characters are different
//first calc the last indices for both strings
var oldMax = oldText.length - 1;
var newMax = newText.length - 1;
for(let i=0;i<oldText.length;i++)
{
if(oldText.charAt(oldMax-i) !== newText.charAt(newMax-i))
{
//with different string lengths, the index will differ for the old and the new text
difEndOld = oldMax-i;
difEndNew = newMax-i;
break;
}
}
var removed = oldText.substr(difStart,difEndOld-difStart+1);
var added = newText.substr(difStart,difEndNew-difStart+1);
return [difStart,added,removed];
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With