Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementation of Levenshtein distance for mysql/fuzzy search?

I would like to be able to search a table as follows for smith as get everything that it within 1 variance.

Data:

 O'Brien Smithe Dolan Smuth Wong Smoth Gunther Smiht 

I have looked into using Levenshtein distance does anyone know how to implement this with it?

like image 865
Andrew Clark Avatar asked Mar 11 '09 15:03

Andrew Clark


People also ask

What is Levenshtein distance in SQL?

The Levenshtein distance metric measures the difference between two strings. That is the minimum number of single-character edits that are required to change one string into another other. Single-character edits can be insertions, deletions, and substitutions.

How is Levenshtein distance calculated?

The Levenshtein distance is usually calculated by preparing a matrix of size (M+1)x(N+1) —where M and N are the lengths of the 2 words—and looping through said matrix using 2 for loops, performing some calculations within each iteration.

Is Levenshtein distance NLP?

The Levenshtein distance used as a metric provides a boost to accuracy of an NLP model by verifying each named entity in the entry. The vector search solution does a good job, and finds the most similar entry as defined by the vectorization.

Is Levenshtein distance an algorithm?

The Levenshtein distance is a string metric for measuring difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.


2 Answers

In order to efficiently search using levenshtein distance, you need an efficient, specialised index, such as a bk-tree. Unfortunately, no database system I know of, including MySQL, implements bk-tree indexes. This is further complicated if you're looking for full-text search, instead of just a single term per row. Off-hand, I can't think of any way that you could do full-text indexing in a manner that allows for searching based on levenshtein distance.

like image 59
Nick Johnson Avatar answered Sep 21 '22 04:09

Nick Johnson


There is a mysql UDF implementation of Levenshtein Distance function

https://github.com/jmcejuela/Levenshtein-MySQL-UDF

It is implemented in C and has better performance than the "MySQL Levenshtein distance query" mentioned by schnaader

like image 30
Hongzheng Avatar answered Sep 21 '22 04:09

Hongzheng