To calculate min edit distance (the minimum amount of insertions, deletions and substitutions required to transform one word to another), a dynamic programming solution is based on the recurrence relation, where the last character of both string is examined. The details are in https://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm.
The description of this algorithm is everywhere on the Internet when it comes to edit distance, but all of them just asserts its correctness without proof. By definition of edit distance, you can insert, delete or substitute characters in the middle, not just at the end. Then how do you prove that this recurrence relation actually holds?
You can view DP as a way to speed up recursion, and the easiest way to prove a recursive algorithm correct is nearly always by induction: Show that it's correct on some small base case(s), and then show that, assuming it is correct for a problem of size n, it is also correct for a problem of size n+1.
Edit distance problem can be solved by many different approaches. But the most efficient approach to solve the Edit distance problem is Dynamic programming approach which takes the O(N * M) time complexity, where N and M are sizes of the strings.
In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two strings (e.g., words) are to one another, that is measured by counting the minimum number of operations required to transform one string into the other.
Minimum Edit distance between two strings str1 and str2 is defined as the minimum number of insert/delete/substitute operations required to transform str1 into str2. For example if str1 = "ab", str2 = "abc" then making an insert operation of character 'c' on str1 transforms str1 into str2.
First, as I said in the comment, you can view dynamic programming as a way to speed up recursion, and the easiest way to prove a recursive algorithm correct is nearly always by induction: Show that it's correct on some small base case(s), and then show that, assuming it is correct for a problem of size n, it is also correct for a problem of size n+1. This way the proof closely mirrors the recursive structure.
The usual recursion for this problem breaks the problem "Find the minimum cost to edit string A into string B" into the (|A|+1)(|B|+1) subproblems "Find the minimum cost to edit the first i characters of string A into the first j characters of string B", for all 0 <= i <= |A| and 0 <= j <= |B|.
Usually with induction, we can pick a small number of simple base cases (perhaps just one), show that we can easily compute the correct answers for them, and it's obvious how the correctness of all other cases will be implied by the correctness of the base cases, because regardless of what case we start with, there will be just a single "chain" of assumptions that needs to be satisfied, and this chain clearly must end at one of our base cases. However for this particular problem, to show that the algorithm solves the (i, j) subproblem optimally, we need to first assume that it solves the (i-1, j), (i, j-1) and (i-1, j-1) subproblems optimally (since if any of the answers to those subproblems was incorrect, it could easily compute a totally wrong answer for the (i, j) subproblem). This will require a more complicated induction than usual: instead of a single "chain" of assumptions that need to be satisfied, we now have a branching "tree" of assumptions, with (up to) 3 children at each point. We need to pick base cases in such a way that for any (i, j), this entire tree of assumptions eventually "stops", i.e., every path in it eventually hits a base case where its assumptions are satisfied.
To recap: To prove our solution to (i, j) optimal, we must assume we have optimal solutions for (i-1, j), (i, j-1), and (i-1, j-1); to satisfy that assumption for, say, (i-1, j) (that is, to prove our solution to (i-1, j) is optimal), we need to assume we have optimal solutions for (i-2, j), (i-1, j-1) and (i-2, j-1), etc., etc. How to choose base case(s) that will work? While traversing down this tree, i.e., in going from proving our solution to the subproblem (i, j) correct to proving our solution to any one of its "child" subproblems (i', j') correct, we notice that:
Basically, if we take a single step down this tree, at least one of our two "subproblem co-ordinates" (i or j) decreases, but never by more than 1. What that means is that if we keep descending through the tree, then regardless of which particular "children" subproblems we pick on the way down, we must eventually hit a subproblem with a 0 for (at least) one of its co-ordinates -- i.e. a (i'', 0) subproblem for some 0 <= i'' <= |A| or a (0, j'') subproblem for some 0 <= j'' <= |B|. And what that means is that if we make those subproblems our base cases, we can ensure that every path in the induction tree hits a base case where its assumptions are satisfied and can therefore stop.
Luckily, these base cases are indeed easy to compute optimal answers to. Consider a problem (i, 0): This problem asks for the minimum cost required to change the first i characters of string A into the first 0 characters of string B. Clearly the best (only!) way to do this is to delete all i characters in the prefix of A, for a cost of i. Likewise the problem (0, j) asks for the minimum cost required to change the first 0 characters of A into the first j characters of B: just as clearly, the best way to do this is to simply insert all j characters in this prefix of B, for a cost of j.
All that remains is the inductive step: Showing that we compute the answer to the (i, j) subproblem correctly, under the assumption that we have computed the answers to the (i-1, j), (i, j-1) and (i-1, j-1) subproblems correctly. The trick is to see that in every possible way of editing the first i characters of A into the first j characters of B, there are actually only 3 possible things that we could do with the last character in each of these prefixes (that is, the i-th character in A and the j-th character in B):
Since those 3 things are the only possible things that we could do, and for each of the 3 we have calculated the overall lowest cost of doing that thing, the overall best thing to do must be the best of the 3 of them. This proves that we correctly calculate the lowest cost needed to edit the first i characters of A into the first j characters of B, for any i and j -- so in particular, it's true for i = |A| and j = |B|, that is, for editing the complete string A into the complete string B.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With