Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect differences between two strings

I have 2 strings

string a = "foo bar"; string b = "bar foo"; 

and I want to detect the changes from a to b. What characters do I have to change, to get from a to b?

I think there must be a iteration over each character and detect if it was added, removed or remained equal. So this is my exprected result

'f' Remove 'o' Remove 'o' Remove ' ' Remove 'b' Equal 'a' Equal 'r' Equal ' ' Add 'f' Add 'o' Add 'o' Add 

class and enum for the result:

public enum Operation { Add,Equal,Remove }; public class Difference {     public Operation op { get; set; }     public char c { get; set; } } 

Here is my solution but the "Remove" case is not clear to me how the code has to look like

public static List<Difference> CalculateDifferences(string left, string right) {     int count = 0;     List<Difference> result = new List<Difference>();     foreach (char ch in left)     {         int index = right.IndexOf(ch, count);         if (index == count)         {             count++;             result.Add(new Difference() { c = ch, op = Operation.Equal });         }         else if (index > count)         {             string add = right.Substring(count, index - count);             result.AddRange(add.Select(x => new Difference() { c = x, op = Operation.Add }));             count += add.Length;         }         else         {             //Remove?         }     }     return result; } 

How does the code have to look like for removed characters?


Update - added a few more examples

example 1:

string a = "foobar"; string b = "fooar"; 

expected result:

'f' Equal 'o' Equal 'o' Equal 'b' Remove 'a' Equal 'r' Equal 

example 2:

string a = "asdfghjk"; string b = "wsedrftr"; 

expected result:

'a' Remove 'w' Add 's' Equal 'e' Add 'd' Equal 'r' Add 'f' Equal 'g' Remove 'h' Remove 'j' Remove 'k' Remove 't' Add 'r' Add 

Update:

Here is a comparison between Dmitry's and ingen's answer: https://dotnetfiddle.net/MJQDAO

like image 295
Impostor Avatar asked Sep 04 '18 08:09

Impostor


People also ask

How do you find the difference between two strings?

You can use StringUtils. difference(String first, String second).

Can you use == to compare two strings?

You should not use == (equality operator) to compare these strings because they compare the reference of the string, i.e. whether they are the same object or not. On the other hand, equals() method compares whether the value of the strings is equal, and not the object itself.

How do you find the difference between two strings in python?

Use the == and != operators to compare two strings for equality. Use the is operator to check if two strings are the same instance. Use the < , > , <= , and >= operators to compare strings alphabetically.

How do you compare two strings examples?

Solution. Following example compares two strings by using str compareTo (string), str compareToIgnoreCase(String) and str compareTo(object string) of string class and returns the ascii difference of first odd characters of compared strings.


1 Answers

You are looking for (minimum) edit distance / (minimum) edit sequence. You can find the theory of the process here:

https://web.stanford.edu/class/cs124/lec/med.pdf

Let's implement (simplest) Levenstein Distance / Sequence algorithm (for details see https://en.wikipedia.org/wiki/Levenshtein_distance). Let's start from helper classes (I've changed a bit your implementation of them):

  public enum EditOperationKind : byte {     None,    // Nothing to do     Add,     // Add new character     Edit,    // Edit character into character (including char into itself)     Remove,  // Delete existing character   };    public struct EditOperation {     public EditOperation(char valueFrom, char valueTo, EditOperationKind operation) {       ValueFrom = valueFrom;       ValueTo = valueTo;        Operation = valueFrom == valueTo ? EditOperationKind.None : operation;     }      public char ValueFrom { get; }     public char ValueTo {get ;}     public EditOperationKind Operation { get; }      public override string ToString() {       switch (Operation) {         case EditOperationKind.None:           return $"'{ValueTo}' Equal";         case EditOperationKind.Add:           return $"'{ValueTo}' Add";         case EditOperationKind.Remove:           return $"'{ValueFrom}' Remove";         case EditOperationKind.Edit:           return $"'{ValueFrom}' to '{ValueTo}' Edit";         default:           return "???";       }     }   } 

As far as I can see from the examples provided we don't have any edit operation, but add + remove; that's why I've put editCost = 2 when insertCost = 1, int removeCost = 1 (in case of tie: insert + remove vs. edit we put insert + remove). Now we are ready to implement Levenstein algorithm:

public static EditOperation[] EditSequence(   string source, string target,    int insertCost = 1, int removeCost = 1, int editCost = 2) {    if (null == source)     throw new ArgumentNullException("source");   else if (null == target)     throw new ArgumentNullException("target");    // Forward: building score matrix    // Best operation (among insert, update, delete) to perform    EditOperationKind[][] M = Enumerable     .Range(0, source.Length + 1)     .Select(line => new EditOperationKind[target.Length + 1])     .ToArray();    // Minimum cost so far   int[][] D = Enumerable     .Range(0, source.Length + 1)     .Select(line => new int[target.Length + 1])     .ToArray();    // Edge: all removes   for (int i = 1; i <= source.Length; ++i) {     M[i][0] = EditOperationKind.Remove;     D[i][0] = removeCost * i;   }    // Edge: all inserts    for (int i = 1; i <= target.Length; ++i) {     M[0][i] = EditOperationKind.Add;     D[0][i] = insertCost * i;   }    // Having fit N - 1, K - 1 characters let's fit N, K   for (int i = 1; i <= source.Length; ++i)     for (int j = 1; j <= target.Length; ++j) {       // here we choose the operation with the least cost       int insert = D[i][j - 1] + insertCost;       int delete = D[i - 1][j] + removeCost;       int edit = D[i - 1][j - 1] + (source[i - 1] == target[j - 1] ? 0 : editCost);        int min = Math.Min(Math.Min(insert, delete), edit);        if (min == insert)          M[i][j] = EditOperationKind.Add;       else if (min == delete)         M[i][j] = EditOperationKind.Remove;       else if (min == edit)         M[i][j] = EditOperationKind.Edit;        D[i][j] = min;     }    // Backward: knowing scores (D) and actions (M) let's building edit sequence   List<EditOperation> result =      new List<EditOperation>(source.Length + target.Length);    for (int x = target.Length, y = source.Length; (x > 0) || (y > 0);) {     EditOperationKind op = M[y][x];      if (op == EditOperationKind.Add) {       x -= 1;       result.Add(new EditOperation('\0', target[x], op));     }     else if (op == EditOperationKind.Remove) {       y -= 1;       result.Add(new EditOperation(source[y], '\0', op));     }     else if (op == EditOperationKind.Edit) {       x -= 1;       y -= 1;       result.Add(new EditOperation(source[y], target[x], op));     }     else // Start of the matching (EditOperationKind.None)       break;   }    result.Reverse();    return result.ToArray(); } 

Demo:

var sequence = EditSequence("asdfghjk", "wsedrftr");   Console.Write(string.Join(Environment.NewLine, sequence)); 

Outcome:

'a' Remove 'w' Add 's' Equal 'e' Add 'd' Equal 'r' Add 'f' Equal 'g' Remove 'h' Remove 'j' Remove 'k' Remove 't' Add 'r' Add 
like image 152
Dmitry Bychenko Avatar answered Oct 02 '22 15:10

Dmitry Bychenko