Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Edit distance recursive algorithm -- Skiena

I'm reading The Algorithm Design Manual by Steven Skiena, and I'm on the dynamic programming chapter. He has some example code for edit distance and uses some functions which are explained neither in the book nor on the internet. So I'm wondering

a) how does this algorithm work?

b) what do the functions indel and match do?

#define MATCH     0       /* enumerated type symbol for match */
#define INSERT    1       /* enumerated type symbol for insert */
#define DELETE    2       /* enumerated type symbol for delete */

int string_compare(char *s, char *t, int i, int j)
{
        int k;                  /* counter */
        int opt[3];             /* cost of the three options */
        int lowest_cost;        /* lowest cost */

        if (i == 0) return(j * indel(' '));
        if (j == 0) return(i * indel(' '));

        opt[MATCH] = string_compare(s,t,i-1,j-1) + match(s[i],t[j]);
        opt[INSERT] = string_compare(s,t,i,j-1) + indel(t[j]);
        opt[DELETE] = string_compare(s,t,i-1,j) + indel(s[i]);

        lowest_cost = opt[MATCH];
        for (k=INSERT; k<=DELETE; k++)
                if (opt[k] < lowest_cost) lowest_cost = opt[k];

        return( lowest_cost );
}
like image 911
ordinary Avatar asked Oct 07 '13 05:10

ordinary


People also ask

How does edit distance algorithm work?

In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two strings (e.g., words) are to one another, that is measured by counting the minimum number of operations required to transform one string into the other.

What is edit distance in bioinformatics?

Edit distance measures the similarity between two strings (as the minimum number of change, insert or delete operations that transform one string to the other). An edit sequence s is a sequence of such operations and can be used to represent the string resulting from applying s to a reference string.

What is edit distance matrix in NLP?

Simply put, edit distance is a measurement of how many changes we must do to one string to transform it into the string we are comparing it to. As an illustration, the difference between “Frederic” and “Fred” is four, as we can change “Frederic” into “Fred” with the deletion of the letters “e” , “r”, “i” and ”c”.


2 Answers

On page 287 in the book:

int match(char c, char d)
{
  if (c == d) return(0); 
  else return(1); 
}

int indel(char c)
{
  return(1);
}
like image 64
Zsolt Safrany Avatar answered Oct 10 '22 02:10

Zsolt Safrany


They're explained in the book. Please read section 8.2.4 Varieties of Edit Distance

like image 45
gigaplex Avatar answered Oct 10 '22 00:10

gigaplex