I have a database table with words from a dictionary.
Now I want to select words for an anagram. For example if I give the string SEPIAN
it should fetch values like apes
, pain
, pains
, pies
, pines
, sepia
, etc.
For this I used the query
SELECT * FROM words WHERE word REGEXP '^[SEPIAN]{1,6}$'
But this query returns words like anna
, essen
which have repeated characters not in the supplied string. Eg. anna
has two n
's but there is only one n
in the search string SEPIAN
.
How can I write my regular expression to achieve this? Also if there are repeated characters in my search string at that time the repeated characters should reflect in the result.
Since MySQL does not support back-referencing capturing groups, the typical solution of (\w).*\1
will not work. This means that any solution given will need to enumerate all possible doubles. Furthermore, as far as I can tell back-references are not valid in look-aheads or look-behinds, and look-aheads and look-behinds are not supported in MySQL.
However, you can split this into two expressions, and use the following query:
SELECT * FROM words
WHERE word REGEXP '^[SEPIAN]{1,6}$'
AND NOT word REGEXP 'S.*?S|E.*?E|P.*?P|I.*?I|A.*?A|N.*?N'
Not very pretty, but it works and it should be fairly efficient as well.
To support a set limit of repeated characters, use the following pattern for your secondary expression:
A(.*?A){X,}
Where A
is your character and X
is the number of times it's allowed.
So if you're adding another N
to your string SEPIANN
(for a total of 2 N
s), your query would become:
SELECT * FROM words
WHERE word REGEXP '^[SEPIAN]{1,7}$'
AND NOT word REGEXP 'S.*?S|E.*?E|P.*?P|I.*?I|A.*?A|N(.*?N){2}'
I guess something like this will help you. Table words
:
| id | word | alfagram |
---------------------------------
| 1 | karabar | aaabkrr |
| 2 | malabar | aaablmr |
| 3 | trantantan| aaannnrttt|
alfagram
here is letters of a word in an alphabetical order.
PHP code:
$searchString = 'abrakadabra';
$searchStringAlfa = array();
for( $i=0,$c=strlen($searchString);$i<$c;$i++ ){
if( isset($searchStringAlfa[$searchString[$i]]) ){
$searchStringAlfa[$searchString[$i]]++;
}else{
$searchStringAlfa[$searchString[$i]] = 1;
}
}
ksort($searchStringAlfa);
$regexp = '^';
foreach( $searchStringAlfa as $alfa=>$amount ){
$regexp .= '['.$alfa.']{0,'.$amount.'}';
}
$regexp .= '$';
$searchString
is the string you wish to search with. Then the only thing you should do is execute the query:
$result = mysql_query('SELECT * FROM words WHERE alfagram REGEXP "'.$regexp.'"');
May be some additional checks and optimisations are needed
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With