I have the problem that I want to match all strings in the database having a certain edit distance to a given string.
My idea was to generate a regular expression that would match all strings with edit distance d
to string s
.
So for example I want to generate a regex r
for d = 1
and s = 'abc'
in the form of: r = 'abc|.abc|.bc|a.c|ab.|abc.'
and so on. But I'm not sure if this is very efficient or are there already some good algorithms to that problem? I want to consider even character swaps in the edit distance. so 'acb'
should also be part of r
. I want to realise it in PHP and then make an SQL query: SELECT * FROM table WHERE name RLIKE TheRegularExpression
.
Is it a good way to make it like that? Or what would you recommend?
You can store a Levenshtein function in Mysql. After that you can simply do the search like this:
mysql_qery("SELECT `term` FROM `words` WHERE levenshtein('$word', `term`) BETWEEN 0 AND '$d'");
Probably the best thing to do is build up an iterative process for all the possibilities. In other words, something like this:
function findall($startString) {
// create an array of all strings that are distance one away
// each element would be $returnArray["abc"] = "abc";
}
$d = 2; // distance
$myArray[$startString] = $startString;
for($i = 0; $i < $d; $i++) {
$newCombos = array_merge(array(), $myArray);
foreach($myArray as $element) {
$newCombos = array_merge($newCombos, findall($element));
}
$myArray = array_merge(array(), $newCombos);
}
$myRegex = implode("|", $myArray);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With