Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysql Regular Expression search with no repeating characters

Tags:

regex

php

mysql

I have a database table with words from a dictionary.

Now I want to select words for an anagram. For example if I give the string SEPIAN it should fetch values like apes, pain, pains, pies, pines, sepia, etc.

For this I used the query

SELECT * FROM words WHERE word REGEXP '^[SEPIAN]{1,6}$'

But this query returns words like anna, essen which have repeated characters not in the supplied string. Eg. anna has two n's but there is only one n in the search string SEPIAN.

How can I write my regular expression to achieve this? Also if there are repeated characters in my search string at that time the repeated characters should reflect in the result.

like image 703
Nithin Avatar asked Jul 16 '12 12:07

Nithin


2 Answers

Since MySQL does not support back-referencing capturing groups, the typical solution of (\w).*\1 will not work. This means that any solution given will need to enumerate all possible doubles. Furthermore, as far as I can tell back-references are not valid in look-aheads or look-behinds, and look-aheads and look-behinds are not supported in MySQL.

However, you can split this into two expressions, and use the following query:

SELECT * FROM words
WHERE word REGEXP '^[SEPIAN]{1,6}$'
AND NOT word REGEXP 'S.*?S|E.*?E|P.*?P|I.*?I|A.*?A|N.*?N'

Not very pretty, but it works and it should be fairly efficient as well.


To support a set limit of repeated characters, use the following pattern for your secondary expression:

A(.*?A){X,}

Where A is your character and X is the number of times it's allowed.

So if you're adding another N to your string SEPIANN (for a total of 2 Ns), your query would become:

SELECT * FROM words
WHERE word REGEXP '^[SEPIAN]{1,7}$'
AND NOT word REGEXP 'S.*?S|E.*?E|P.*?P|I.*?I|A.*?A|N(.*?N){2}'
like image 197
dlras2 Avatar answered Oct 20 '22 21:10

dlras2


I guess something like this will help you. Table words:

| id    | word      | alfagram  |
---------------------------------
| 1     | karabar   | aaabkrr   |
| 2     | malabar   | aaablmr   |
| 3     | trantantan| aaannnrttt|

alfagram here is letters of a word in an alphabetical order.

PHP code:

$searchString = 'abrakadabra';
$searchStringAlfa = array();
for( $i=0,$c=strlen($searchString);$i<$c;$i++ ){
    if( isset($searchStringAlfa[$searchString[$i]]) ){
        $searchStringAlfa[$searchString[$i]]++;
    }else{
        $searchStringAlfa[$searchString[$i]] = 1;
    }
}
ksort($searchStringAlfa);
$regexp = '^';
foreach( $searchStringAlfa as $alfa=>$amount ){
    $regexp .= '['.$alfa.']{0,'.$amount.'}';
}
$regexp .= '$';

$searchString is the string you wish to search with. Then the only thing you should do is execute the query:

$result = mysql_query('SELECT * FROM words WHERE alfagram REGEXP "'.$regexp.'"');

May be some additional checks and optimisations are needed

like image 44
Timur Avatar answered Oct 20 '22 20:10

Timur