Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate regular expression for given string and edit distance

I have the problem that I want to match all strings in the database having a certain edit distance to a given string.

My idea was to generate a regular expression that would match all strings with edit distance d to string s.

So for example I want to generate a regex r for d = 1 and s = 'abc' in the form of: r = 'abc|.abc|.bc|a.c|ab.|abc.' and so on. But I'm not sure if this is very efficient or are there already some good algorithms to that problem? I want to consider even character swaps in the edit distance. so 'acb' should also be part of r. I want to realise it in PHP and then make an SQL query: SELECT * FROM table WHERE name RLIKE TheRegularExpression.

Is it a good way to make it like that? Or what would you recommend?

like image 356
Martin Cup Avatar asked Nov 02 '12 17:11

Martin Cup


2 Answers

You can store a Levenshtein function in Mysql. After that you can simply do the search like this:

mysql_qery("SELECT `term` FROM `words` WHERE levenshtein('$word', `term`) BETWEEN 0 AND '$d'");
like image 149
enrico.bacis Avatar answered Sep 20 '22 06:09

enrico.bacis


Probably the best thing to do is build up an iterative process for all the possibilities. In other words, something like this:

function findall($startString) {
    // create an array of all strings that are distance one away
    // each element would be $returnArray["abc"] = "abc";
}

$d = 2; // distance
$myArray[$startString] = $startString;

for($i = 0; $i < $d; $i++) {
    $newCombos = array_merge(array(), $myArray);
    foreach($myArray as $element) {
        $newCombos = array_merge($newCombos, findall($element));
    }
    $myArray = array_merge(array(), $newCombos);
}

$myRegex = implode("|", $myArray);
like image 40
durron597 Avatar answered Sep 24 '22 06:09

durron597