Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize this Python code (from ThinkPython, Exercise 10.10)

Tags:

python

I'm working through Allen Downey's How To Think Like A Computer Scientist, and I've written what I believe to be a functionally correct solution to Exercise 10.10. But it took just over 10 hours (!) to run, so I'm wondering if I'm missing some really obvious and helpful optimization.

Here's the Exercise:

"Two words 'interlock' if taking alternating letters from each forms a new word. For example, 'shoe' and 'cold' interlock to form 'schooled'. Write a program that finds all pairs of words that interlock. Hint: Don't enumerate all pairs!"

(For these word-list problems, Downey has supplied a file with 113809 words. We may assume that these words are in a list, one word per item in the list.)

Here's my solution:

from bisect import bisect_left

def index(lst, target):
    """If target is in list, returns the index of target; otherwise returns None"""
    i = bisect_left(lst, target)
    if i != len(lst) and lst[i] == target:
        return i
    else:
        return None

def interlock(str1, str2):
    "Takes two strings of equal length and 'interlocks' them."
    if len(str1) == len(str2):
        lst1 = list(str1)
        lst2 = list(str2)
        result = []
        for i in range(len(lst1)):
            result.append(lst1[i])
            result.append(lst2[i])
        return ''.join(result)
    else:
        return None

def interlockings(word_lst):
    """Checks each pair of equal-length words to see if their interlocking is a word; prints each successful pair and the total number of successful pairs."""
    total = 0
    for i in range(1, 12):  # 12 because max word length is 22
        # to shorten the loops, get a sublist of words of equal length
        sub_lst = filter(lambda(x): len(x) == i, word_lst)
        for word1 in sub_lst[:-1]:
            for word2 in sub_lst[sub_lst.index(word1)+1:]: # pair word1 only with words that come after word1
                word1word2 = interlock(word1, word2) # interlock word1 with word2
                word2word1 = interlock(word2, word1) # interlock word2 with word1
                if index(lst, word1word2): # check to see if word1word2 is actually a word
                    total += 1
                    print "Word 1: %s, Word 2: %s, Interlock: %s" % (word1, word2, word1word2)
                if index(lst, word2word1): # check to see if word2word1 is actually a word
                    total += 1
                    print "Word 2, %s, Word 1: %s, Interlock: %s" % (word2, word1, word2word1)
    print "Total interlockings: ", total

The print statements are not the problem; my program found only 652 such pairs. The problem is the nested loops, right? I mean, even though I'm looping over lists that contain only words of the same length, there are (for example) 21727 words of length 7, which means my program has to check over 400 million "interlockings" to see if they're actual words---and that's just for the length-7 words.

So again, this code took 10 hours to run (and found no pairs involving words of length 5 or greater, in case you were curious). Is there a better way to solve this problem?

Thanks in advance for any and all insight. I'm aware that "premature optimization is the root of all evil"---and perhaps I've fallen into that trap already---but in general, while I can usually write code that runs correctly, I often struggle with writing code that runs well.

like image 286
Alex Basson Avatar asked Apr 02 '11 12:04

Alex Basson


1 Answers

Do it the other way around: Iterate through all words and split them into two words by taking the odd and even letters. Then look up those two words in the dictionary.

As a side node, the two words that interlock must not necessarily have the same length -- the lengths might also differ by 1.

Some (untested) code:

words = set(line.strip() for line in open("words"))
for w in words:
    even, odd = w[::2], w[1::2]
    if even in words and odd in words:
        print even, odd
like image 180
Sven Marnach Avatar answered Sep 19 '22 23:09

Sven Marnach