Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL and word games

In a word game similar to Ruzzle or Letterpress, where users have to construct words out of a given set of letters:

enter image description here

I keep my dictionary in a simple SQL table:

create table good_words (
        word varchar(16) primary key
);

Since the game duration is very short I do not want to check every entered word by calling a PHP script, which would look that word up in the good_words table.

Instead I'd like to download all possible words by one PHP script call before the round starts - since all letters are known.

My question is: if there is a nice SQLish way to find such words?

I.e. I could run a longer-taking script once to add a column to good_words table, which would have same letters as in the word columnt, but sorted alphabetically... But I still can't think of a way to match for it given a set of letters.

And doing the word matching inside of a PHP script (vs. inside the database) would probably take too long (because of bandwidth: would have to fetch every row from the database to the PHP script).

Any suggestions or insights please?

Using postgresql-8.4.13 with CentOS Linux 6.3.

UPDATE:

Other ideas I have:

  1. Create a constantly running script (cronjob or daemon) which would prefill an SQL table with precompiled letters board and possible words - but still feels like a waste of bandwidth and CPU, I would prefer to solve this inside the database
  2. Add integer columns a, b, ... , z and whenever I store a word into good_words, store the letter occurences there. I wonder if it is possible to create an insert trigger in Pl/PgSQL for that?
like image 814
Alexander Farber Avatar asked Mar 05 '13 09:03

Alexander Farber


2 Answers

Nice question, I upvoted.

What you're up to is a list of all possible permutations of the given letters of a given length. As described in the PostgreSQL wiki, you can create a function and call it like this (matches highlighted letters in your screenshot):

SELECT * FROM permute('{E,R,O,M}'::text[]);

Now, to query the good_words use something like:

SELECT gw.word, gw.stamp
  FROM good_words gw
  JOIN permute('{E,R,O,M}'::text[]) s(w) ON gw.word=array_to_string(s.w, '');
like image 122
vyegorov Avatar answered Sep 21 '22 11:09

vyegorov


This could be a start, except that it doesn't check if we have enough letters, only if he have the right letters.

SELECT word from
(select word,generate_series(0,length(word)) as s from good_words) as q
WHERE substring(word,s,1) IN ('t','h','e','l','e','t','t','e','r','s')
GROUP BY word
HAVING count(*)>=length(word);

http://sqlfiddle.com/#!1/2e3a2/3

EDIT:

This query select only the valid words though it seems a bit redundant. It's not perfect but certainly proves it can be done.

WITH words AS 
(SELECT word, substring(word,s,1) as sub from
(select word,generate_series(1,length(word)) as s from good_words) as q
WHERE substring(word,s,1) IN ('t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'))

SELECT w.word FROM
(
SELECT word,words.sub,count(DISTINCT s) as cnt FROM
(SELECT s, substring(array_to_string(l, ''),s,1) as sub FROM
(SELECT l, generate_subscripts(l,1) as s FROM 
 (SELECT ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'] as l) 
 as q) 
as q) as let JOIN
words ON let.sub=words.sub
GROUP BY words.word,words.sub) as let
JOIN
(select word,sub,count(*) as cnt from words
 GROUP BY word, sub)
as w ON let.word=w.word AND let.sub=w.sub AND let.cnt>=w.cnt
GROUP BY w.word
HAVING sum(w.cnt)=length(w.word);

Fiddle with all possible 3+ letters words (485) for that image: http://sqlfiddle.com/#!1/2fc66/1 Fiddle with 699 words out of which 485 are correct: http://sqlfiddle.com/#!1/4f42e/1

Edit 2: We can use array operators like so to get a list of words that contain the letters we want:

SELECT word as sub from
(select word,generate_series(1,length(word)) as s from good_words) as q
GROUP BY word
HAVING array_agg(substring(word,s,1)) <@ ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'];

So we can use it to narrow down the list of words we need to check.

WITH words AS 
(SELECT word, substring(word,s,1) as sub from
(select word,generate_series(1,length(word)) as s from 
(
  SELECT word from
(select word,generate_series(1,length(word)) as s from good_words) as q
GROUP BY word
HAVING array_agg(substring(word,s,1)) <@ ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s']
)as q) as q)
SELECT DISTINCT w.word FROM
(
SELECT word,words.sub,count(DISTINCT s) as cnt FROM
(SELECT s, substring(array_to_string(l, ''),s,1) as sub FROM
(SELECT l, generate_subscripts(l,1) as s FROM 
 (SELECT ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'] as l) 
 as q) 
as q) as let JOIN
words ON let.sub=words.sub
GROUP BY words.word,words.sub) as let
JOIN
(select word,sub,count(*) as cnt from words
 GROUP BY word, sub)
as w ON let.word=w.word AND let.sub=w.sub AND let.cnt>=w.cnt
GROUP BY w.word
HAVING sum(w.cnt)=length(w.word) ORDER BY w.word;

http://sqlfiddle.com/#!1/4f42e/44

We can use GIN indexes to work on arrays so we probably could create a table that would store the arrays of letters and make words point to it (act, cat and tact would all point to array [a,c,t]) so probably that would speed things up but that's up for testing.

like image 32
Jakub Kania Avatar answered Sep 20 '22 11:09

Jakub Kania