Consider the following arrangement of letters:
B
O A
N R I
D E N T
Start at the top letter and choose one of the two letters below, Plinko-style, until you reach the bottom. No matter what path you choose you create a four-letter word: BOND, BONE, BORE, BORN, BARE, BARN, BAIN, or BAIT. The fact that DENT reads across the bottom is just a nice coincidence.
I'd like help figuring out an algorithm that can design such a layout, where each possible path from the top to the bottom generates a distinct word from a (provided) dictionary. The inputs to the program would be a starting letter (B, in this example) and a word length n (4, in this example). It would return either the letters that make up such a layout, or a message saying that it's not possible. It doesn't have to be deterministic, so it could possibly generate different layouts with the same input.
I haven't thought of anything better than a brute-force approach so far. That is, for all 26^[(n+2)(n-1)/2]
ways of choosing letters for the bottom part of the layout, to check if all possible 2^(n-1)
paths give words that are in the dictionary. I considered some sort of prefix-tree, but the fact that paths can cross and share letters messed with me. I'm most comfortable in Python, but at minimum I'd just like the idea for an algorithm or approach that would work for this problem. Thanks.
Pretend the V W X Y Z
on the bottom here actually complete words.
B
A O
I R N
T N E D
V W X Y Z
We can implement a backtracking search with heuristics so strict, it seems unlikely any wrong path would go very far.
Insert all the n
sized words that begin with the same letter in a simple tree as below. Now perform a depth first search, asserting the following: each successive level needs one additional "shared" letter, meaning p(letter)
instances of it on that level, with the additional requirement that their two children are the same letters (e.g., the two R
s in parentheses on level 2 could be a "shared" letter because their children are the same).
What is p(letter)
? Pascal's triangle of course! n choose r
is exactly the number of instances of the letter needed on the relevant level of this simple tree, according to the Plinko board. On level 3, if we've chosen R
and R
, we'll need 3 N
s and 3 E
s to express the "shared" letters on that level. And each of the 3 N
s must have the same children letters (W,X in this case), and each of the 3 E
s must also (X,Y).
B
/ \
A O
/ \ / \
I (R) (R) N
/ \ / \ / \ / \
T (N) (N) E (N) E E D
V W W X W X X Y W X X Y X Y Y Z
4 W's, 6 X's, 4 Y's
Out of curiosity, here's some Python code :)
from itertools import combinations
from copy import deepcopy
# assumes words all start
# with the same letter and
# are of the same length
def insert(word, i, tree):
if i == len(word):
return
if word[i] in tree:
insert(word, i + 1, tree[word[i]])
else:
tree[word[i]] = {}
insert(word, i + 1, tree[word[i]])
# Pascal's triangle
def get_next_needed(needed):
next_needed = [[1, None, 0]] + [None] * (len(needed) - 1) + [[1, None, 0]]
for i, _ in enumerate(needed):
if i == len(needed) - 1:
next_needed[i + 1] = [1, None, 0]
else:
next_needed[i + 1] = [needed[i][0] + needed[i+1][0], None, 0]
return next_needed
def get_candidates(next_needed, chosen, parents):
global log
if log:
print "get_candidates: parents: %s" % parents
# For each chosen node we need two children.
# The corners have only one shared node, while
# the others in each group are identical AND
# must have all have a pair of children identical
# to the others' in the group. Additionally, the
# share sequence matches at the ends of each group.
# I (R) (R) N
# / \ / \ / \ / \
# T (N) (N) E (N) E E D
# Iterate over the parents, choosing
# two nodes for each one
def g(cs, s, seq, i, h):
if log:
print "cs, seq, s, i, h: %s, %s, %s, %s, %s" % (cs, s, seq, i, h)
# Base case, we've achieved a candidate sequence
if i == len(parents):
return [(cs, s, seq)]
# The left character in the corner is
# arbitrary; the next one, shared.
# Left corner:
if i == 0:
candidates = []
for (l, r) in combinations(chosen[0].keys(), 2):
_cs = deepcopy(cs)
_cs[0] = [1, l, 1]
_cs[1][1] = r
_cs[1][2] = 1
_s = s[:]
_s.extend([chosen[0][l], chosen[0][r]])
_h = deepcopy(h)
# save the indexes in cs of the
# nodes chosen for the parent
_h[parents[1]] = [1, 2]
candidates.extend(g(_cs, _s, l+r, 1, _h))
_cs = deepcopy(cs)
_cs[0] = [1, r, 1]
_cs[1][1] = l
_cs[1][2] = 1
_s = s[:]
_s.extend([chosen[0][r], chosen[0][l]])
_h = deepcopy(h)
# save the indexes in cs of the
# nodes chosen for the parent
_h[parents[1]] = [1, 2]
candidates.extend(g(_cs, _s, r+l, 1, _h))
if log:
print "returning candidates: %s" % candidates
return candidates
# The right character is arbitrary but the
# character before it must match the previous one.
if i == len(parents)-1:
l = cs[len(cs)-2][1]
if log:
print "rightmost_char: %s" % l
if len(chosen[i]) < 2 or (not l in chosen[i]):
if log:
print "match not found: len(chosen[i]) < 2 or (not l in chosen[i])"
return []
else:
result = []
for r in [x for x in chosen[i].keys() if x != l]:
_cs = deepcopy(cs)
_cs[len(cs)-2][2] = _cs[len(cs)-2][2] + 1
_cs[len(cs)-1] = [1, r, 1]
_s = s[:] + [chosen[i][l], chosen[i][r]]
result.append((_cs, _s, seq + l + r))
return result
parent = parents[i]
if log:
print "get_candidates: g: parent, i: %s, %s" % (parent, i)
_h = deepcopy(h)
if not parent in _h:
prev = _h[parents[i-1]]
_h[parent] = [prev[0] + 1, prev[1] + 1]
# parent left and right children
pl, pr = _h[parent]
if log:
print "pl, pr: %s, %s" % (pl, pr)
l = cs[pl][1]
if log:
print "rightmost_char: %s" % l
if len(chosen[i]) < 2 or (not l in chosen[i]):
if log:
print "match not found: len(chosen[i]) < 2 or (not l in chosen[i])"
return []
else:
# "Base case," parent nodes have been filled
# so this is a duplicate character on the same
# row, which needs a new assignment
if cs[pl][0] == cs[pl][2] and cs[pr][0] == cs[pr][2]:
if log:
print "TODO"
return []
# Case 2, right child is not assigned
if not cs[pr][1]:
candidates = []
for r in [x for x in chosen[i].keys() if x != l]:
_cs = deepcopy(cs)
_cs[pl][2] += 1
_cs[pr][1] = r
_cs[pr][2] = 1
_s = s[:]
_s.extend([chosen[i][l], chosen[i][r]])
# save the indexes in cs of the
# nodes chosen for the parent
candidates.extend(g(_cs, _s, seq+l+r, i+1, _h))
return candidates
# Case 3, right child is already assigned
elif cs[pr][1]:
r = cs[pr][1]
if not r in chosen[i]:
if log:
print "match not found: r ('%s') not in chosen[i]" % r
return []
else:
_cs = deepcopy(cs)
_cs[pl][2] += 1
_cs[pr][2] += 1
_s = s[:]
_s.extend([chosen[i][l], chosen[i][r]])
# save the indexes in cs of the
# nodes chosen for the parent
return g(_cs, _s, seq+l+r, i+1, _h)
# Otherwise, fail
return []
return g(next_needed, [], "", 0, {})
def f(words, n):
global log
tree = {}
for w in words:
insert(w, 0, tree)
stack = []
root = tree[words[0][0]]
head = words[0][0]
for (l, r) in combinations(root.keys(), 2):
# (shared-chars-needed, chosen-nodes, board)
stack.append(([[1, None, 0],[1, None, 0]], [root[l], root[r]], [head, l + r], [head, l + r]))
while stack:
needed, chosen, seqs, board = stack.pop()
if log:
print "chosen: %s" % chosen
print "board: %s" % board
# Return early for demonstration
if len(board) == n:
# [y for x in chosen for y in x[1]]
return board
next_needed = get_next_needed(needed)
candidates = get_candidates(next_needed, chosen, seqs[-1])
for cs, s, seq in candidates:
if log:
print " cs: %s" % cs
print " s: %s" % s
print " seq: %s" % seq
_board = board[:]
_board.append("".join([x[1] for x in cs]))
_seqs = seqs[:]
_seqs.append(seq)
stack.append((cs, s, _seqs, _board))
"""
B
A O
I R N
T N E D
Z Y X W V
"""
words = [
"BONDV",
"BONDW",
"BONEW",
"BONEX",
"BOREW",
"BOREX",
"BAREW",
"BAREX",
"BORNX",
"BORNY",
"BARNX",
"BARNY",
"BAINX",
"BAINY",
"BAITY",
"BAITZ"]
N = 5
log = True
import time
start_time = time.time()
solution = f(list(words), N)
print ""
print ""
print("--- %s seconds ---" % (time.time() - start_time))
print "solution: %s" % solution
print ""
if solution:
for i, row in enumerate(solution):
print " " * (N - 1 - i) + " ".join(row)
print ""
print "words: %s" % words
I find this a quite interesting problem.
The first attempt was a random solver; in other words it just fills the triangle with letters and then counts how many "errors" are present (words not in the dictionary). Then a hill-climbing is performed by changing a one or more letter randomly and seeing if the error improves; if the error remains the same the changes are still accepted (so making a random-walk on plateau areas).
Amazingly enough this can solve in reasonable time non-obvious problems like 5-letter words starting with 'b':
b
a u
l n r
l d g s
o y s a e
I then tried a full-search approach to be able to answer also "no-solution" part and the idea was to write a recursive search:
Just write down all acceptable words on the left side; e.g.
b
a ?
l ? ?
l ? ? ?
o ? ? ? ?
and call recursively until we find an acceptable solution or fail
Write down all acceptable words on the right side if the second letter is greater than the second letter of the first word, e.g.
b
a u
l ? r
l ? ? k
o ? ? ? e
This is done to avoid searching symmetric solutions (for any given solution another one can be obtained by simply mirroring on the X axis)
In the general case the first question mark is replaced with all letters in the alphabet if for all words that use the chosen question mark either
If no solution is found for the specific question mark chosen there's no point in keeping searching so False
is returned. Probably using some heuristics for choosing which question mark for fill first would speed up search, I didn't investigate that possibility.
For the case 2 (searching if there are compatible words) I am creating 26*(N-1)
sets of words that have a prescribed character in a certain position (position 1 is not considered) and I'm using set intersection on all non-question-mark characters.
This approach is able to tell in about 30 seconds (PyPy) that there are no solution for 5-letter words starting with w
(there are 468 words in the dictionary with that starting letter).
The code for this implementation can be seen at
https://gist.github.com/6502/26552858e93ce4d4ec3a8ef46100df79
(the program expects a file named words_alpha.txt
containing all valid words and then must be called specifiying the initial letter and the size;
as dictionary I used the file from https://github.com/dwyl/english-words)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With