Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Levenshtein distance c# count error type

I found this bit of code that computes Levenshtein's distance between an answer and a guess:

int CheckErrors(string Answer, string Guess)
{
    int[,] d = new int[Answer.Length + 1, Guess.Length + 1];
    for (int i = 0; i <= Answer.Length; i++)
        d[i, 0] = i;
    for (int j = 0; j <= Guess.Length; j++)
        d[0, j] = j;
    for (int j = 1; j <= Guess.Length; j++)
        for (int i = 1; i <= Answer.Length; i++)
            if (Answer[i - 1] == Guess[j - 1])
                d[i, j] = d[i - 1, j - 1];  //no operation
            else
                d[i, j] = Math.Min(Math.Min(
                    d[i - 1, j] + 1,    //a deletion

                    d[i, j - 1] + 1),   //an insertion

                    d[i - 1, j - 1] + 1 //a substitution

                );
    return d[Answer.Length, Guess.Length];
}

But I need a way to do a count for the amount of times each error occurs. Is there an easy way to implement that?

like image 798
user1988332 Avatar asked Mar 21 '13 22:03

user1988332


People also ask

How Levenshtein distance is calculated?

The Levenshtein distance is usually calculated by preparing a matrix of size (M+1)x(N+1) —where M and N are the lengths of the 2 words—and looping through said matrix using 2 for loops, performing some calculations within each iteration.

Is Levenshtein distance NLP?

The Levenshtein distance used as a metric provides a boost to accuracy of an NLP model by verifying each named entity in the entry. The vector search solution does a good job, and finds the most similar entry as defined by the vectorization.

How does Levenshtein distance work?

Levenshtein distance is a lexical similarity measure which identifies the distance between one pair of strings. It does so by counting the number of times you would have to insert, delete or substitute a character from string 1 to make it like string 2.

Is Levenshtein distance edit distance?

The Levenshtein distance (a.k.a edit distance) is a measure of similarity between two strings. It is defined as the minimum number of changes required to convert string a into string b (this is done by inserting, deleting or replacing a character in string a ).


1 Answers

Seems like you could add counters for each of the operations:

                if (Answer[i - 1] == Guess[j - 1])
                    d[i, j] = d[i - 1, j - 1];  //no operation
                else
                {
                    int del = d[i-1, j] + 1;
                    int ins = d[i, j-1] + 1;
                    int sub = d[i-1, j-1] + 1;
                    int op = Math.Min(Math.Min(del, ins), sub);
                    d[i, j] = op;
                    if (i == j)
                    {
                        if (op == del)
                            ++deletions;
                        else if (op == ins)
                            ++insertions;
                        else
                            ++substitutions;
                    }
                }
like image 137
Jim Mischel Avatar answered Oct 11 '22 19:10

Jim Mischel