I guess you could classify this as a Scrabble style problem, but it started out due to a friend mentioning the UK TV quiz show Countdown. Various rounds in the show involve the contestants being presented a scrambled set of letters and they have to come up with the longest word they can. The one my friend mentioned was "RAEPKWAEN".
In fairly short order I whipped up something in Python to handle this problem, using PyEnchant to handle the dictionary look-ups, however I'm noticing that it really can't scale all that well.
Here's what I have currently:
#!/usr/bin/python
from itertools import permutations
import enchant
from sys import argv
def find_longest(origin):
s = enchant.Dict("en_US")
for i in range(len(origin),0,-1):
print "Checking against words of length %d" % i
pool = permutations(origin,i)
for comb in pool:
word = ''.join(comb)
if s.check(word):
return word
return ""
if (__name__)== '__main__':
result = find_longest(argv[1])
print result
That's fine on a 9 letter example like they use in the show, 9 factorial = 362,880 and 8 factorial = 40,320. On that scale even if it would have to check all possible permutations and word lengths it's not that many.
However once you reach 14 characters that's 87,178,291,200 possibly combinations, meaning you're reliant on luck that a 14 character word is quickly found.
With the example word above it's taking my machine about 12 1/2 seconds to find "reawaken". With 14 character scrambled words we could be talking on the scale of 23 days just to check all possible 14 character permutations.
Is there any more efficient way to handle this?
An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. For example, the word anagram itself can be rearranged into nag a ram, also the word binary into brainy and the word adobe into abode.
to put things such as words or letters in the wrong order so that they do not make sense: He had a habit of scrambling his words when excited.
Implementation of Jeroen Coupé idea from his answer with letters count:
from collections import defaultdict, Counter
def find_longest(origin, known_words):
return iter_longest(origin, known_words).next()
def iter_longest(origin, known_words, min_length=1):
origin_map = Counter(origin)
for i in xrange(len(origin) + 1, min_length - 1, -1):
for word in known_words[i]:
if check_same_letters(origin_map, word):
yield word
def check_same_letters(origin_map, word):
new_map = Counter(word)
return all(new_map[let] <= origin_map[let] for let in word)
def load_words_from(file_path):
known_words = defaultdict(list)
with open(file_path) as f:
for line in f:
word = line.strip()
known_words[len(word)].append(word)
return known_words
if __name__ == '__main__':
known_words = load_words_from('words_list.txt')
origin = 'raepkwaen'
big_origin = 'raepkwaenaqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnm'
print find_longest(big_origin, known_words)
print list(iter_longest(origin, known_words, 5))
Output (for my small 58000 words dict):
counterrevolutionaries
['reawaken', 'awaken', 'enwrap', 'weaken', 'weaker', 'apnea', 'arena', 'awake',
'aware', 'newer', 'paean', 'parka', 'pekan', 'prank', 'prawn', 'preen', 'renew',
'waken', 'wreak']
Notes:
It's simple implementation without optimizations.
words_list.txt
- can be /usr/share/dict/words
on Linux.
UPDATE
In case we need to find word only once, and we have dictionary with words sorted by length, e.g. by this script:
with open('words_list.txt') as f:
words = f.readlines()
with open('words_by_len.txt', 'w') as f:
for word in sorted(words, key=lambda w: len(w), reverse=True):
f.write(word)
We can find longest word without loading full dict to memory:
from collections import Counter
import sys
def check_same_letters(origin_map, word):
new_map = Counter(word)
return all(new_map[let] <= origin_map[let] for let in word)
def iter_longest_from_file(origin, file_path, min_length=1):
origin_map = Counter(origin)
origin_len = len(origin)
with open(file_path) as f:
for line in f:
word = line.strip()
if len(word) > origin_len:
continue
if len(word) < min_length:
return
if check_same_letters(origin_map, word):
yield word
def find_longest_from_file(origin, file_path):
return iter_longest_from_file(origin, file_path).next()
if __name__ == '__main__':
origin = sys.argv[1] if len(sys.argv) > 1 else 'abcdefghijklmnopqrstuvwxyz'
print find_longest_from_file(origin, 'words_by_len.txt')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With