I would like to be able to search a table as follows for smith as get everything that it within 1 variance.
Data:
O'Brien Smithe Dolan Smuth Wong Smoth Gunther Smiht
I have looked into using Levenshtein distance does anyone know how to implement this with it?
The Levenshtein distance metric measures the difference between two strings. That is the minimum number of single-character edits that are required to change one string into another other. Single-character edits can be insertions, deletions, and substitutions.
The Levenshtein distance is usually calculated by preparing a matrix of size (M+1)x(N+1) —where M and N are the lengths of the 2 words—and looping through said matrix using 2 for loops, performing some calculations within each iteration.
The Levenshtein distance used as a metric provides a boost to accuracy of an NLP model by verifying each named entity in the entry. The vector search solution does a good job, and finds the most similar entry as defined by the vectorization.
The Levenshtein distance is a string metric for measuring difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.
In order to efficiently search using levenshtein distance, you need an efficient, specialised index, such as a bk-tree. Unfortunately, no database system I know of, including MySQL, implements bk-tree indexes. This is further complicated if you're looking for full-text search, instead of just a single term per row. Off-hand, I can't think of any way that you could do full-text indexing in a manner that allows for searching based on levenshtein distance.
There is a mysql UDF implementation of Levenshtein Distance function
https://github.com/jmcejuela/Levenshtein-MySQL-UDF
It is implemented in C and has better performance than the "MySQL Levenshtein distance query" mentioned by schnaader
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With