I'm working through Allen Downey's How To Think Like A Computer Scientist, and I've written what I believe to be a functionally correct solution to Exercise 10.10. But it took just over 10 hours (!) to run, so I'm wondering if I'm missing some really obvious and helpful optimization.
Here's the Exercise:
"Two words 'interlock' if taking alternating letters from each forms a new word. For example, 'shoe' and 'cold' interlock to form 'schooled'. Write a program that finds all pairs of words that interlock. Hint: Don't enumerate all pairs!"
(For these word-list problems, Downey has supplied a file with 113809 words. We may assume that these words are in a list, one word per item in the list.)
Here's my solution:
from bisect import bisect_left
def index(lst, target):
"""If target is in list, returns the index of target; otherwise returns None"""
i = bisect_left(lst, target)
if i != len(lst) and lst[i] == target:
return i
else:
return None
def interlock(str1, str2):
"Takes two strings of equal length and 'interlocks' them."
if len(str1) == len(str2):
lst1 = list(str1)
lst2 = list(str2)
result = []
for i in range(len(lst1)):
result.append(lst1[i])
result.append(lst2[i])
return ''.join(result)
else:
return None
def interlockings(word_lst):
"""Checks each pair of equal-length words to see if their interlocking is a word; prints each successful pair and the total number of successful pairs."""
total = 0
for i in range(1, 12): # 12 because max word length is 22
# to shorten the loops, get a sublist of words of equal length
sub_lst = filter(lambda(x): len(x) == i, word_lst)
for word1 in sub_lst[:-1]:
for word2 in sub_lst[sub_lst.index(word1)+1:]: # pair word1 only with words that come after word1
word1word2 = interlock(word1, word2) # interlock word1 with word2
word2word1 = interlock(word2, word1) # interlock word2 with word1
if index(lst, word1word2): # check to see if word1word2 is actually a word
total += 1
print "Word 1: %s, Word 2: %s, Interlock: %s" % (word1, word2, word1word2)
if index(lst, word2word1): # check to see if word2word1 is actually a word
total += 1
print "Word 2, %s, Word 1: %s, Interlock: %s" % (word2, word1, word2word1)
print "Total interlockings: ", total
The print statements are not the problem; my program found only 652 such pairs. The problem is the nested loops, right? I mean, even though I'm looping over lists that contain only words of the same length, there are (for example) 21727 words of length 7, which means my program has to check over 400 million "interlockings" to see if they're actual words---and that's just for the length-7 words.
So again, this code took 10 hours to run (and found no pairs involving words of length 5 or greater, in case you were curious). Is there a better way to solve this problem?
Thanks in advance for any and all insight. I'm aware that "premature optimization is the root of all evil"---and perhaps I've fallen into that trap already---but in general, while I can usually write code that runs correctly, I often struggle with writing code that runs well.
Do it the other way around: Iterate through all words and split them into two words by taking the odd and even letters. Then look up those two words in the dictionary.
As a side node, the two words that interlock must not necessarily have the same length -- the lengths might also differ by 1.
Some (untested) code:
words = set(line.strip() for line in open("words"))
for w in words:
even, odd = w[::2], w[1::2]
if even in words and odd in words:
print even, odd
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With