Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Evaluating the distribution of words in a grid

I'm creating a word search and am trying to calculate quality of the generated puzzles by verifying the word set is "distributed evenly" throughout the grid. For example placing each word consecutively, filling them up row-wise is not particularly interesting because there will be clusters and the user will quickly notice a pattern.

How can I measure how 'evenly distributed' the words are?

What I'd like to do is write a program that takes in a word search as input and output a score that evaluates the 'quality' of the puzzle. I'm wondering if anyone has seen a similar problem and could refer me to some resources. Perhaps there is some concept in statistics that might help? Thanks.

like image 257
qwer Avatar asked Oct 14 '22 10:10

qwer


1 Answers

The basic problem is distribution of lines in a square or rectangle. You can eighter do this geometrically or using integer arrays. I will try the integer arrays here.

Let M be a matrix of your puzzle,

A B C D
E F G H
I J K L
M N O P

Let the word "EFGH" be an existent word, as well as "CGKO". Then, create a matrix which will contain the count of membership in eighter words in each cell:

0 0 1 0
1 1 2 1
0 0 1 0
0 0 1 0

Apply a rule: the current cell value is equal to the sum of all neighbours (4-way) and multiply with the cell's original value, if the original value is 2 or higher.

0 0 1 0      1 2 2 2
1 1 2 1  -\  1 3 8 2
0 0 1 0  -/  1 2 3 2
0 0 1 0      0 1 1 1

And sum up all values in rows and columns the matrix:

1 2 2 2 =  7
1 3 8 2 = 14
1 2 3 2 =  8
0 1 1 1 =  3
| | | |
3 7 | 6
    14

Then calculate the avarage of both result sets:

(7 + 14 + 8 + 3) / 4 = 32 / 4 = 8
(3 + 7 + 14 + 6) / 4 = 30 / 4 = 7.5

And calculate the avarage difference to the avarage of each result set:

3  <-> 7.5 = 4.5       7  <-> 8 = 1
7  <-> 7.5 = 0.5       14 <-> 8 = 6
14 <-> 7.5 = 6.5       8  <-> 8 = 0
6  <-> 7.5 = 1.5       3  <-> 8 = 5
             ___avg               ___avg
             3.25                 3

And multiply them together:

3 * 3.25 = 9.75

Which you treat as a distributionscore. You might need to tweak it a little bit to make it work better, but this should calculate distributionscores quite nicely.

Here is an example of a bad distribution:

1 0 0 0      1 1 0 0      2
1 0 0 0  -\  2 1 0 0  -\  3         -\  C avg 2.5  -\  C avg-2-avg 0.5
1 0 0 0  -/  2 1 0 0  -/  3         -/  R avg 2.5  -/  R avg-2-avg 2.5
1 0 0 0      1 1 0 0      2                                       _____*
                           6 4 0 0                                 1.25 < score

Edit: calc. errors fixed.

like image 106
Pindatjuh Avatar answered Oct 20 '22 01:10

Pindatjuh