Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Levenshtein algorithm: How do I meet this text editing requirements?

I'm using levenshtein algorithm to meet these requirements:

When finding a word of N characters, the words to suggest as correction in my dictionary database are:

Every dictionary word of N characters that has 1 character of difference with the found word. Example: found word:bearn, dictionary word: bears

Every dictionary word of N+1 characters that has N characters equal to the found word. Example: found word:bear, dictionary word: bears

Every dictionary word of N-1 characters that has N-1 characters equal to the found word. Example: found word: bears, dictionary word: bear

I'm using this implementation of Levenshtein algorithm in C++ to find when a word has a Levenshtein number of 1 (which is the Levenshtein number for all three cases), but then how do I choose the word to suggest? I read about Boyer-Moore-Horspool and Knuth-Morris-Pratt but I'm not sure on how either of them can be helpful.

#include <string>
#include <vector>
#include <algorithm>

using namespace std;

int levenshtein(const string &s1, const string &s2)
{
   string::size_type N1 = s1.length();
   string::size_type N2 = s2.length();
   string::size_type i, j;
   vector<int> T(N2+1);

   for ( i = 0; i <= N2; i++ )
      T[i] = i;

   for ( i = 0; i < N1; i++ ) {
      T[0] = i+1;
      int corner = i;
      for ( j = 0; j < N2; j++ ) {
         int upper = T[j+1];
         if ( s1[i] == s2[j] )
            T[j+1] = corner;
         else
            T[j+1] = min(T[j], min(upper, corner)) + 1;
         corner = upper;
      }
   }
   return T[N2];
}
like image 388
franvergara66 Avatar asked Nov 16 '25 13:11

franvergara66


2 Answers

You may also want to add Norvig's excellent article on spelling correction to your reading.

It's been a while since I've read it but I remember it being very similar to what your writing about.

like image 78
Kenan Banks Avatar answered Nov 19 '25 01:11

Kenan Banks


As I've said elsewhere, Boyer-Moore isn't really apt for this. Since you want to search for multiple stings simultanously, the algorithm of Wu and Manber should be more to your liking.

I've posted a proof of concept C++ code in answer to another question. Heed the caveats mentioned there.

like image 32
Konrad Rudolph Avatar answered Nov 19 '25 02:11

Konrad Rudolph



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!