SQL

Question

I've wrote an SQL query that should be simple, that turns out is not so simple. I have a database of 1.2 million words (several languages)+ more. My neice asked how many words I can make with 5 letters from the letters jxtehmrungce. I then decided to run a test. Well, turns out it's easy to write a query like this. However!~ There must be an easier solution? The more characters, the longer the query.

Below, it's looping through all characters (letters) in order of alphabet

SELECT count(DISTINCT `word`) as `numrows` FROM `words` WHERE LENGTH(`word`) = '5' AND `chars` REGEXP ' ([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])+([g{0,1}+]|[i{0,1}+]|[l{0,1}+]|[m{0,1}+]|[n{0,1}+]|[o{0,1}+]|[r{0,1}+]|[t{0,1}+]|[u{0,1}+]|[x{0,1}+])' AND `verified` = '1'

I will be using this on yougowords.com for the unscrambler tool against a 3.9 million row table if it works well, but it's a very time consuming query. How can I improve this? There are probably several regular expressions, but, if you change the set of characters to something with double or triple letters, such as adding an extra j, g, h, or adding more letters etc. jjtehhmrungcs

Edit - No duplicating characters, hence why you see 2 e's in the characters, but not 2 t's. ( jxtehmrungce )

{0,1} was a minimum to maximum setting since duplicate characters can have more than 1 and you can make a 5 letter word by using only 1 of the duplicated letters, or both of them. {0,1} could be written as {1,2} - but, also need to set a maximum amount of letters possible. The word can't have 3 e's because jxtehmrungce only has 2.

I have no SQL experience, I'm basing this example off of my own limited knowledge.

Chars Column For a different program, I created the chars column for alphabetical organization of all letters in the word. So the word "life" in order is efil and the word happy would be ahppy. I could use either or column to get the same result with this query but the chars column places characters in order, so jxtehmrungce becomes ceeghjmnrtux. Could help with finding words that have "only" 2 e's?

Gordon Linoff · Accepted Answer

Does this do what you want?

select count(distinct word)
from words w
where word regex '[jxtehmrungce]{5}' and verified = '1';

Or are you looking for permutations of the five characters?

EDIT:

If you are limited to the characters in the list, then you have a more complicated query. I would take the approach of generating all possible combinations and then looking to see if they are words:

create view i_c as
   select 1 as i, 'j' as c union all
   select 2 as i, 'x' as c union all
   select 3 as i, 't' as c union all
   select 4 as i, 'e' as c union all
   select 5 as i, 'h' as c union all
   select 6 as i, 'm' as c union all
   select 7 as i, 'r' as c union all
   select 8 as i, 'u' as c union all
   select 9 as i, 'n' as c union all
   select 10 as i, 'g' as c union all
   select 11 as i, 'c' as c union all
   select 12 as i, 'e' as c;

select count(distinct w.word)
from i_c c1 join
     i_c c2
     on c2.i not in (c1.i) join
     i_c c3
     on c3.i not in (c1.i, c2.i) join
     i_c c4
     on c4.i not in (c1.i, c2.i, c3.i) join
     i_c c4
     on c5.i not in (c1.i, c2.i, c3.i, c4.i) join
     words w
     on concat(c1.c, c2.c, c3.c, c4.c, c5.c) = w.word
where w.verified = 1;

SQL - 1388 Character SQL Query. ( Must Be A Simpler Solution? )

Tags:

regex

mysql

James Cordeiro

1 Answers

Gordon Linoff

Recent Activity

Donate For Us