Improved Levenshtein Algorithm

Question

I recently implemented the levenshtein algorithm into our search engine database, but we have come across a problem.

According to the basic levenshtein

Levenshtein('123456','12x456') is the same value as Levenshtein('123456', '12345x')

Normally this is fine, but for my specific problem that is incorrect. When someone uses our website, this is incorrect. Manufacturers of electronic components often make similar products with only a difference in the very last letter. If the first letter is different, it's usually an entirely different category. Therefore I need an algorithm that considers matches near the beginning of the word more valuable than those in the back or in other words, mismatches that occur near the beginning should apply a larger penalty than those in the back.

If anyone has any idea please let me know.

Will Castillo · Accepted Answer

Use the Jaro-Winkler Distance... It's exactly what you are asking for.

Pierre · Answer

See the Smith-Waterman algorithm widely used in bioinformatics. It can perform a local alignment of your query, but that will be slower that Levenshtein.

Rob Di Marco · Answer

We had a similar issue and used the Brew edit distance method

We were in Perl so we used the Text::Brew library. My co-worker did a nice presentation on using a few different algorithms, including Brew.

Improved Levenshtein Algorithm

Tags:

algorithm

levenshtein-distance

Mike D

3 Answers

Will Castillo

Pierre

Rob Di Marco

Recent Activity

Donate For Us

Improved Levenshtein Algorithm

Tags:

algorithm

levenshtein-distance

Mike D

3 Answers

Will Castillo

Pierre

Rob Di Marco

Related questions

Recent Activity

Donate For Us