Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many common English words of 4 letters or more can you make from the letters of a given word (each letter can only be used once)

On the back of a block calendar I found the following riddle:

How many common English words of 4 letters or more can you make from the letters of the word 'textbook' (each letter can only be used once).

My first solution that I came up with was:

from itertools import permutations

with open('/usr/share/dict/words') as f:
    words = f.readlines()

words = map(lambda x: x.strip(), words)

given_word = 'textbook'

found_words = []

ps = (permutations(given_word, i) for i in range(4, len(given_word)+1))

for p in ps:
    for word in map(''.join, p):
        if word in words and word != given_word:
            found_words.append(word)
print set(found_words)  

This gives the result set(['tote', 'oboe', 'text', 'boot', 'took', 'toot', 'book', 'toke', 'betook']) but took more than 7 minutes on my machine.

My next iteration was:

with open('/usr/share/dict/words') as f:
    words = f.readlines()

words = map(lambda x: x.strip(), words)

given_word = 'textbook'

print [word for word in words if len(word) >= 4 and sorted(filter(lambda letter: letter in word, given_word)) == sorted(word) and word != given_word]

Which return an answer almost immediately but gave as answer: ['book', 'oboe', 'text', 'toot']

What is the fastest, correct and most pythonic solution to this problem?

(edit: added my earlier permutations solution and its different output).

like image 320
BioGeek Avatar asked Jan 15 '12 21:01

BioGeek


3 Answers

I thought I'd share this slightly interesting trick although it takes a good bit more code than the rest and isn't really "pythonic". This will take a good bit more code than the other solutions but should be rather quick if I look at the timing the others need.

We're doing a bit preprocessing to speed up the computations. The basic approach is the following: We assign every letter in the alphabet a prime number. E.g. A = 2, B = 3, and so on. We then compute a hash for every word in the alphabet which is simply the product of the prime representations of every character in the word. We then store every word in a dictionary indexed by the hash.

Now if we want to find out which words are equivalent to say textbook we only have to compute the same hash for the word and look it up in our dictionary. Usually (say in C++) we'd have to worry about overflows, but in python it's even simpler than that: Every word in the list with the same index will contain exactly the same characters.

Here's the code with the slightly optimization that in our case we only have to worry about characters also appearing in the given word, which means we can get by with a much smaller prime table than otherwise (the obvious optimization would be to only assign characters that appear in the word a value at all - it was fast enough anyhow so I didn't bother and this way we could pre process only once and do it for several words). The prime algorithm is useful often enough so you should have one yourself anyhow ;)

from collections import defaultdict
from itertools import permutations

PRIMES = list(gen_primes(256)) # some arbitrary prime generator

def get_dict(path):
    res = defaultdict(list)
    with open(path, "r") as file:
        for line in file.readlines():
            word = line.strip().upper()
            hash = compute_hash(word)
            res[hash].append(word)
    return res

def compute_hash(word):
    hash = 1
    for char in word:
        try:
            hash *= PRIMES[ord(char) - ord(' ')]
        except IndexError:
            # contains some character out of range - always 0 for our purposes
            return 0
    return hash

def get_result(path, given_word):
    words = get_dict(path)
    given_word = given_word.upper()
    result = set()
    powerset = lambda x: powerset(x[1:]) + [x[:1] + y for y in powerset(x[1:])] if x else [x]
    for word in (word for word in powerset(given_word) if len(word) >= 4):
        hash = compute_hash(word)
        for equiv in words[hash]:
            result.add(equiv)
    return result

if __name__ == '__main__':
    path = "dict.txt"
    given_word = "textbook"
    result = get_result(path, given_word)
    print(result)

Runs on my ubuntu word list (98k words) rather quickly, but not what I'd call pythonic since it's basically a port of a c++ algorithm. Useful if you want to compare more than one word that way..

like image 64
Voo Avatar answered Oct 09 '22 03:10

Voo


How about this?

from itertools import permutations, chain

with open('/usr/share/dict/words') as fp:
    words = set(fp.read().split())

given_word = 'textbook'

perms = (permutations(given_word, i) for i in range(4, len(given_word)+1))
pwords = (''.join(p) for p in chain(*perms))
matches = words.intersection(pwords)

print matches

which gives

>>> print matches
set(['textbook', 'keto', 'obex', 'tote', 'oboe', 'text', 'boot', 'toto', 'took', 'koto', 'bott', 'tobe', 'boke', 'toot', 'book', 'bote', 'otto', 'toke', 'toko', 'oket'])
like image 3
DSM Avatar answered Oct 09 '22 03:10

DSM


There is a generator itertools.permutations with which you can gather all permutations of a sequence with a specified length. That makes it easier:

from itertools import permutations

GIVEN_WORD = 'textbook'

with open('/usr/share/dict/words', 'r') as f:
    words = [s.strip() for s in f.readlines()]

print len(filter(lambda x: ''.join(x) in words, permutations(GIVEN_WORD, 4)))

Edit #1: Oh! It says "4 or more" ;) Forget what I said!

Edit #2: This is the second version I came up with:

LETTERS = set('textbook')

with open('/usr/share/dict/words') as f:
    WORDS = filter(lambda x: len(x) >= 4, [l.strip() for l in f])

matching = filter(lambda x: set(x).issubset(LETTERS) and all([x.count(c) == 1 for c in x]), WORDS)
print len(matching)
like image 2
Gandaro Avatar answered Oct 09 '22 01:10

Gandaro