Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating the complexity of Levenshtein Edit Distance

I have been looking at this simple python implementation of Levenshtein Edit Distance for all day now.

def lev(a, b):
    """Recursively calculate the Levenshtein edit distance between two strings, a and b.
    Returns the edit distance.
    """
    if("" == a):
        return len(b)   # returns if a is an empty string
    if("" == b):
        return len(a)   # returns if b is an empty string
    return min(lev(a[:-1], b[:-1])+(a[-1] != b[-1]), lev(a[:-1], b)+1, lev(a, b[:-1])+1)

From: http://www.clear.rice.edu/comp130/12spring/editdist/

I know it has an exponential complexity, but how would I proceed to calculate that complexity from scratch?

I have been searching all over the internet but haven't found any explainations only statements that it is exponential.

Thanks.

like image 584
John Avatar asked Jan 31 '13 15:01

John


1 Answers

  1. Draw the call tree (which you apparently have already done).

  2. Abstract from the call tree. For arbitrary n, determine the depth d of the tree as a function of n.

    Also, determine how many branches/children there are per node, on average, as n approaches infinity; that's called the average branching factor b.

  3. Realize that visiting every node in a tree of depth d with average branching factor b takes at least on the order of b ^ d operations. Write that figure in terms of n and you have a lower bound on complexity in terms of the input size.

More specifically: you keep recursing until you hit an empty string, taking one character off each time. If we call the lengths of the strings m and n, then the depth of the tree is min(m, n). At every node in the call tree except the leaves, you recurse exactly three times, so in the limit the average branching factor is 3. That gives us a call tree of Θ(3^min(m, n)) nodes. The worst case occurs when m = n, so we can call that Θ(3^n).

This is still only a lower bound on the complexity. For the full picture, you should also take into account the amount of work done between recursive calls. In this naive code, that's actually linear time because a[:-1] has to copy (at Θ(n) cost) almost all of a, giving Θ(n 3^n) total complexity.*

[* I once caught a CS professor using Python's slicing in a binary search, which as a result ran in time Θ(n lg n).]

like image 163
Fred Foo Avatar answered Sep 19 '22 10:09

Fred Foo