largest possible rectangle of letters

Question

Write a program to find the largest possible rectangle of letters such that every row forms a word (left to right) and every column forms a word (top to bottom).

I found this interesting question. It's not homework, though it may sound as such. (I'm not in school). I'm doing this for fun.

Example

From cat, car, ape, api, rep, tip we get the following rectangle (which is a square):
c a r
a p e
t i p

My initial idea is to build a some sort of a prefix tree so I can retrieve all words that start with a specific string. This would be useful when we already have 2 or more words (either reading top to bottom or left to right) and we need to find the next word to add.

Any other ideas?

Edit

Could this be done with a cuboid (3D rectangle)?

What if it needs to have valid words on the diagonals (idea credit: user645466); how would the algo for it be optimized?

PengOne · Accepted Answer

Given the dictionary of words of a given length, create two new dictionaries: The first contains all single letter prefixes of words (all letters that can be the first letter of a word of the given length), and the second contains all double letter prefixes of words (all sequences of two letters that can be the first two letters of a word of the given length). You can do triple prefixes as well, but you probably won't need to go beyond that.

Choose a word from the dictionary, call it X. This will be the first row of the matrix.
Check that X[1], X[2], ..., X[N] are all valid single letter prefixes using that handy list you made. If they are, go on to step 3; otherwise go back to step 1.
Choose a word from the dictionary, call it Y. This will be the second row of the matrix.
Check that X[1] Y[1], X[2] Y[2], ..., X[N] Y[N] are all valid double letter prefixes using that handy list you made. If they are, go on to step 5; otherwise go back to step 3. If this was the last word in the dictionary, then go all the way back to step 1.

...

2(N-1). Choose a word from the dictionary, call it Z. This will be the Nth row of the matrix.

2N. Check that X[1] Y[1] ... Z[1], X[2] Y[2] ... Z[2], ..., X[N] Y[N] ... Z[N] are all words in the dictionary. If they are, congrats, you've done it! Otherwise go back to step 2(N-1). If this was the last word in the dictionary, then go all the way back to step 2(n-3).

The logic is to build up the rectangle of words one row at a time, selecting words for the rows and then checking that the column could be completed to words. This will progress much faster than adding one letter at a time.

Per · Answer

Let D be the dictionary. Fix m and n. We can formulate the problem of finding an m × n rectangle as a constraint satisfaction problem (CSP).

x_i,1…x_i,n ∈ D ∀i ∈ {1, …, m}
x_1,j…x_m,j ∈ D ∀j ∈ {1, …, n}
x_i,j ∈ {A, …, Z} ∀i ∈ {1, …, m}, ∀j ∈ {1, …, n}

A very common approach for solving CSPs is to combine backtracking with constraint propagation. Loosely speaking, backtracking means we pick a variable, guess its value, and recursively solve the subproblem with one fewer variable, and constraint propagation means trying to reduce the number of possibilities for each variable (possibly to zero, which means there's no solution).

As an example, we might start a 3 × 3 grid by choosing x_1,1 = Q.

Q??
???
???

With an English dictionary, the only possibility for x_1,2 and x_2,1 is U (in before Scrabble “words”).

The art of solving CSPs is balancing between backtracking and constraint propagation. If we don't propagate constraints at all, then we're just using brute force. If we propagate constraints perfectly, then we don't need to backtrack, but a propagation algorithm that solves an NP-hard problem by itself is probably rather expensive to run.

In this problem, working with a large dictionary at each backtracking node will get expensive unless we have good data structure support. I'll outline an approach that uses a trie or a DAWG quickly to compute the set of letters via which a prefix extends to a complete word.

At each backtracking node, the set of variables we have assigned is a Young tableau. In other words, no variable is assigned until the variables above it and to the left have been assigned. In the diagram below, . denotes an assigned variable and * and ? denote unassigned variables.

.....*
...*??
...*??
..*???
*?????

The variables marked * are candidates for the next to be assigned a value. The advantage of having multiple choices rather choosing a fixed variable each time is that some variable orderings can be much better than others.

For each *, make two lookups into the trie/DAWG, one for the horizontal and one for the vertical, and compute the intersection of the sets of letters that can come next. One classic strategy is to choose the variable with the fewest possibilities in the hope that we reach a contradiction faster. I like this strategy because it prunes naturally when there's a variable with zero possibilities and propagates naturally when there's a variable with one. By caching results, we can make evaluating each node very fast.

largest possible rectangle of letters

Tags:

algorithm

optimization

np-hard

Adrian

2 Answers

PengOne

Per

Recent Activity

Donate For Us

largest possible rectangle of letters

Tags:

algorithm

optimization

np-hard

Adrian

2 Answers

PengOne

Per

Related questions

Recent Activity

Donate For Us