I have 2 strings
string a = "foo bar"; string b = "bar foo";
and I want to detect the changes from a
to b
. What characters do I have to change, to get from a
to b
?
I think there must be a iteration over each character and detect if it was added, removed or remained equal. So this is my exprected result
'f' Remove 'o' Remove 'o' Remove ' ' Remove 'b' Equal 'a' Equal 'r' Equal ' ' Add 'f' Add 'o' Add 'o' Add
class and enum for the result:
public enum Operation { Add,Equal,Remove }; public class Difference { public Operation op { get; set; } public char c { get; set; } }
Here is my solution but the "Remove" case is not clear to me how the code has to look like
public static List<Difference> CalculateDifferences(string left, string right) { int count = 0; List<Difference> result = new List<Difference>(); foreach (char ch in left) { int index = right.IndexOf(ch, count); if (index == count) { count++; result.Add(new Difference() { c = ch, op = Operation.Equal }); } else if (index > count) { string add = right.Substring(count, index - count); result.AddRange(add.Select(x => new Difference() { c = x, op = Operation.Add })); count += add.Length; } else { //Remove? } } return result; }
How does the code have to look like for removed characters?
Update - added a few more examples
example 1:
string a = "foobar"; string b = "fooar";
expected result:
'f' Equal 'o' Equal 'o' Equal 'b' Remove 'a' Equal 'r' Equal
example 2:
string a = "asdfghjk"; string b = "wsedrftr";
expected result:
'a' Remove 'w' Add 's' Equal 'e' Add 'd' Equal 'r' Add 'f' Equal 'g' Remove 'h' Remove 'j' Remove 'k' Remove 't' Add 'r' Add
Update:
Here is a comparison between Dmitry's and ingen's answer: https://dotnetfiddle.net/MJQDAO
You can use StringUtils. difference(String first, String second).
You should not use == (equality operator) to compare these strings because they compare the reference of the string, i.e. whether they are the same object or not. On the other hand, equals() method compares whether the value of the strings is equal, and not the object itself.
Use the == and != operators to compare two strings for equality. Use the is operator to check if two strings are the same instance. Use the < , > , <= , and >= operators to compare strings alphabetically.
Solution. Following example compares two strings by using str compareTo (string), str compareToIgnoreCase(String) and str compareTo(object string) of string class and returns the ascii difference of first odd characters of compared strings.
You are looking for (minimum) edit distance / (minimum) edit sequence. You can find the theory of the process here:
https://web.stanford.edu/class/cs124/lec/med.pdf
Let's implement (simplest) Levenstein Distance / Sequence algorithm (for details see https://en.wikipedia.org/wiki/Levenshtein_distance). Let's start from helper classes (I've changed a bit your implementation of them):
public enum EditOperationKind : byte { None, // Nothing to do Add, // Add new character Edit, // Edit character into character (including char into itself) Remove, // Delete existing character }; public struct EditOperation { public EditOperation(char valueFrom, char valueTo, EditOperationKind operation) { ValueFrom = valueFrom; ValueTo = valueTo; Operation = valueFrom == valueTo ? EditOperationKind.None : operation; } public char ValueFrom { get; } public char ValueTo {get ;} public EditOperationKind Operation { get; } public override string ToString() { switch (Operation) { case EditOperationKind.None: return $"'{ValueTo}' Equal"; case EditOperationKind.Add: return $"'{ValueTo}' Add"; case EditOperationKind.Remove: return $"'{ValueFrom}' Remove"; case EditOperationKind.Edit: return $"'{ValueFrom}' to '{ValueTo}' Edit"; default: return "???"; } } }
As far as I can see from the examples provided we don't have any edit operation, but add + remove; that's why I've put editCost = 2
when insertCost = 1
, int removeCost = 1
(in case of tie: insert + remove
vs. edit
we put insert + remove
). Now we are ready to implement Levenstein algorithm:
public static EditOperation[] EditSequence( string source, string target, int insertCost = 1, int removeCost = 1, int editCost = 2) { if (null == source) throw new ArgumentNullException("source"); else if (null == target) throw new ArgumentNullException("target"); // Forward: building score matrix // Best operation (among insert, update, delete) to perform EditOperationKind[][] M = Enumerable .Range(0, source.Length + 1) .Select(line => new EditOperationKind[target.Length + 1]) .ToArray(); // Minimum cost so far int[][] D = Enumerable .Range(0, source.Length + 1) .Select(line => new int[target.Length + 1]) .ToArray(); // Edge: all removes for (int i = 1; i <= source.Length; ++i) { M[i][0] = EditOperationKind.Remove; D[i][0] = removeCost * i; } // Edge: all inserts for (int i = 1; i <= target.Length; ++i) { M[0][i] = EditOperationKind.Add; D[0][i] = insertCost * i; } // Having fit N - 1, K - 1 characters let's fit N, K for (int i = 1; i <= source.Length; ++i) for (int j = 1; j <= target.Length; ++j) { // here we choose the operation with the least cost int insert = D[i][j - 1] + insertCost; int delete = D[i - 1][j] + removeCost; int edit = D[i - 1][j - 1] + (source[i - 1] == target[j - 1] ? 0 : editCost); int min = Math.Min(Math.Min(insert, delete), edit); if (min == insert) M[i][j] = EditOperationKind.Add; else if (min == delete) M[i][j] = EditOperationKind.Remove; else if (min == edit) M[i][j] = EditOperationKind.Edit; D[i][j] = min; } // Backward: knowing scores (D) and actions (M) let's building edit sequence List<EditOperation> result = new List<EditOperation>(source.Length + target.Length); for (int x = target.Length, y = source.Length; (x > 0) || (y > 0);) { EditOperationKind op = M[y][x]; if (op == EditOperationKind.Add) { x -= 1; result.Add(new EditOperation('\0', target[x], op)); } else if (op == EditOperationKind.Remove) { y -= 1; result.Add(new EditOperation(source[y], '\0', op)); } else if (op == EditOperationKind.Edit) { x -= 1; y -= 1; result.Add(new EditOperation(source[y], target[x], op)); } else // Start of the matching (EditOperationKind.None) break; } result.Reverse(); return result.ToArray(); }
Demo:
var sequence = EditSequence("asdfghjk", "wsedrftr"); Console.Write(string.Join(Environment.NewLine, sequence));
Outcome:
'a' Remove 'w' Add 's' Equal 'e' Add 'd' Equal 'r' Add 'f' Equal 'g' Remove 'h' Remove 'j' Remove 'k' Remove 't' Add 'r' Add
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With