I've wrote an SQL query that should be simple, that turns out is not so simple. I have a database of 1.2 million words (several languages)+ more. My neice asked how many words I can make with 5 letters from the letters jxtehmrungce. I then decided to run a test. Well, turns out it's easy to write a query like this. However!~ There must be an easier solution? The more characters, the longer the query.
Below, it's looping through all characters (letters) in order of alphabet
SELECT count(DISTINCT `word`) as `numrows` FROM `words` WHERE LENGTH(`word`) = '5' AND `chars` REGEXP ' ([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])' AND `verified` = '1'
I will be using this on yougowords.com for the unscrambler tool against a 3.9 million row table if it works well, but it's a very time consuming query. How can I improve this? There are probably several regular expressions, but, if you change the set of characters to something with double or triple letters, such as adding an extra j, g, h, or adding more letters etc. jjtehhmrungcs
Edit - No duplicating characters, hence why you see 2 e's in the characters, but not 2 t's. ( jxtehmrungce )
I have no SQL experience, I'm basing this example off of my own limited knowledge.
Chars Column For a different program, I created the chars column for alphabetical organization of all letters in the word. So the word "life" in order is efil and the word happy would be ahppy. I could use either or column to get the same result with this query but the chars column places characters in order, so jxtehmrungce becomes ceeghjmnrtux. Could help with finding words that have "only" 2 e's?
Does this do what you want?
select count(distinct word)
from words w
where word regex '[jxtehmrungce]{5}' and verified = '1';
Or are you looking for permutations of the five characters?
EDIT:
If you are limited to the characters in the list, then you have a more complicated query. I would take the approach of generating all possible combinations and then looking to see if they are words:
create view i_c as
select 1 as i, 'j' as c union all
select 2 as i, 'x' as c union all
select 3 as i, 't' as c union all
select 4 as i, 'e' as c union all
select 5 as i, 'h' as c union all
select 6 as i, 'm' as c union all
select 7 as i, 'r' as c union all
select 8 as i, 'u' as c union all
select 9 as i, 'n' as c union all
select 10 as i, 'g' as c union all
select 11 as i, 'c' as c union all
select 12 as i, 'e' as c;
select count(distinct w.word)
from i_c c1 join
i_c c2
on c2.i not in (c1.i) join
i_c c3
on c3.i not in (c1.i, c2.i) join
i_c c4
on c4.i not in (c1.i, c2.i, c3.i) join
i_c c4
on c5.i not in (c1.i, c2.i, c3.i, c4.i) join
words w
on concat(c1.c, c2.c, c3.c, c4.c, c5.c) = w.word
where w.verified = 1;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With