Proof of correct of the dynamic programming approach to min edit distance

Tags:

To calculate min edit distance (the minimum amount of insertions, deletions and substitutions required to transform one word to another), a dynamic programming solution is based on the recurrence relation, where the last character of both string is examined. The details are in https://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm.

The description of this algorithm is everywhere on the Internet when it comes to edit distance, but all of them just asserts its correctness without proof. By definition of edit distance, you can insert, delete or substitute characters in the middle, not just at the end. Then how do you prove that this recurrence relation actually holds?

985

asked Mar 07 '16 03:03

Siyuan Ren

1 Answers

Use Induction to Prove Recursive Algorithms Correct

First, as I said in the comment, you can view dynamic programming as a way to speed up recursion, and the easiest way to prove a recursive algorithm correct is nearly always by induction: Show that it's correct on some small base case(s), and then show that, assuming it is correct for a problem of size n, it is also correct for a problem of size n+1. This way the proof closely mirrors the recursive structure.

The usual recursion for this problem breaks the problem "Find the minimum cost to edit string A into string B" into the (|A|+1)(|B|+1) subproblems "Find the minimum cost to edit the first i characters of string A into the first j characters of string B", for all 0 <= i <= |A| and 0 <= j <= |B|.

Choosing Base Cases

Usually with induction, we can pick a small number of simple base cases (perhaps just one), show that we can easily compute the correct answers for them, and it's obvious how the correctness of all other cases will be implied by the correctness of the base cases, because regardless of what case we start with, there will be just a single "chain" of assumptions that needs to be satisfied, and this chain clearly must end at one of our base cases. However for this particular problem, to show that the algorithm solves the (i, j) subproblem optimally, we need to first assume that it solves the (i-1, j), (i, j-1) and (i-1, j-1) subproblems optimally (since if any of the answers to those subproblems was incorrect, it could easily compute a totally wrong answer for the (i, j) subproblem). This will require a more complicated induction than usual: instead of a single "chain" of assumptions that need to be satisfied, we now have a branching "tree" of assumptions, with (up to) 3 children at each point. We need to pick base cases in such a way that for any (i, j), this entire tree of assumptions eventually "stops", i.e., every path in it eventually hits a base case where its assumptions are satisfied.

To recap: To prove our solution to (i, j) optimal, we must assume we have optimal solutions for (i-1, j), (i, j-1), and (i-1, j-1); to satisfy that assumption for, say, (i-1, j) (that is, to prove our solution to (i-1, j) is optimal), we need to assume we have optimal solutions for (i-2, j), (i-1, j-1) and (i-2, j-1), etc., etc. How to choose base case(s) that will work? While traversing down this tree, i.e., in going from proving our solution to the subproblem (i, j) correct to proving our solution to any one of its "child" subproblems (i', j') correct, we notice that:

At least one of i' < i or j' < j holds.
We never "skip over" subproblems -- that is, we never have i-i' >= 2, or j-j' >= 2.

Basically, if we take a single step down this tree, at least one of our two "subproblem co-ordinates" (i or j) decreases, but never by more than 1. What that means is that if we keep descending through the tree, then regardless of which particular "children" subproblems we pick on the way down, we must eventually hit a subproblem with a 0 for (at least) one of its co-ordinates -- i.e. a (i'', 0) subproblem for some 0 <= i'' <= |A| or a (0, j'') subproblem for some 0 <= j'' <= |B|. And what that means is that if we make those subproblems our base cases, we can ensure that every path in the induction tree hits a base case where its assumptions are satisfied and can therefore stop.

Luckily, these base cases are indeed easy to compute optimal answers to. Consider a problem (i, 0): This problem asks for the minimum cost required to change the first i characters of string A into the first 0 characters of string B. Clearly the best (only!) way to do this is to delete all i characters in the prefix of A, for a cost of i. Likewise the problem (0, j) asks for the minimum cost required to change the first 0 characters of A into the first j characters of B: just as clearly, the best way to do this is to simply insert all j characters in this prefix of B, for a cost of j.

Inductive Step

All that remains is the inductive step: Showing that we compute the answer to the (i, j) subproblem correctly, under the assumption that we have computed the answers to the (i-1, j), (i, j-1) and (i-1, j-1) subproblems correctly. The trick is to see that in every possible way of editing the first i characters of A into the first j characters of B, there are actually only 3 possible things that we could do with the last character in each of these prefixes (that is, the i-th character in A and the j-th character in B):

Pair A[i] with B[j]. Either they match (cost 0) or not (cost 1), but either way, the total cost of this pairing must be that cost (either 0 or 1), plus the smallest possible cost of editing the rest of the prefix of A (A[1 .. i-1]) into the rest of the prefix of B (B[1 .. j-1]) -- which, by assumption, we have already calculated correctly!
Delete A[i]. This costs 1, so the total cost of doing this must be 1 plus the smallest possible cost of editing the rest of the prefix of A (A[1 .. i-1]) into the entire prefix of B (B[1 .. j]) -- which, by assumption, we have already calculated correctly!
Insert B[j]. This costs 1, so the total cost of doing this must be 1 plus the smallest possible cost of editing the entire prefix of A (A[1 .. i]) into the rest of the prefix of B (B[1 .. j-1]) -- which, by assumption, we have already calculated correctly!

Since those 3 things are the only possible things that we could do, and for each of the 3 we have calculated the overall lowest cost of doing that thing, the overall best thing to do must be the best of the 3 of them. This proves that we correctly calculate the lowest cost needed to edit the first i characters of A into the first j characters of B, for any i and j -- so in particular, it's true for i = |A| and j = |B|, that is, for editing the complete string A into the complete string B.

answered Oct 05 '22 07:10

j_random_hacker

Related questions
                            
                                How to pick a random element from an array that matches a certain criteria in O(1)
                            
                                Efficient algorithm to calculate the sum of all k-products
                            
                                Random choices of two values
                            
                                Mapper and Reducer for K means algorithm in Hadoop in Java
                            
                                A puzzle on data structure
                            
                                Overflow-safe modular addition and subtraction in C?
                            
                                designing a algorithm for a large data
                            
                                Which deconvolution algorithm is best suited for removing motion blur from text?
                            
                                Fast algorithm to detect main colors in an image?
                            
                                Last byte in Huffman compression
                            
                                Find out if point is inside one of N (possibly overlapping) rectangles in less than O(N)
                            
                                Sort resource access schedule for multiple threads so the number of writing conflicts gets minimized
                            
                                Determining a string has all unique characters without using additional data structures and without the lowercase characters assumption
                            
                                Is fibonacci search faster than binary search?
                            
                                difference between subquadratic and quadratic algorithm [closed]
                            
                                Algorithm to calculate the odds of a team winning a sports match given full history
                            
                                Efficiently get subset of strings "startingWith" out of a set
                            
                                How do I merge two lists of tuples based on a key?
                            
                                Complexity of Bubble Sort
                            
                                Reversible "hash" function from 64-bit integer to 64-bit integer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Proof of correct of the dynamic programming approach to min edit distance

Tags:

language-agnostic

algorithm

dynamic-programming