How to compute "shortest distance" between two words?

Question

Recently I had an interview and I was asked to write a algorithm to find the minimum number of 1 letter changes to get from a particular of word to a given word , i.e. Cat->Cot->Cog->Dog

I dont want the solution of the problem just guide me through How I can use BFS in this algorithm ?

robert king · Accepted Answer

according to this scrabble list, the shortest path between cat and dog is: ['CAT', 'COT', 'COG', 'DOG']

from urllib import urlopen

def get_words():
    try:
        html = open('three_letter_words.txt').read()
    except IOError:
        html = urlopen('http://www.yak.net/kablooey/scrabble/3letterwords.html').read()
        with open('three_letter_words.txt', 'w') as f:
            f.write(html)

    b = html.find('<PRE>') #ignore the html before the <pre>
    while True:
        a = html.find("<B>", b) + 3
        b = html.find("</B>", a)
        word = html[a: b]
        if word == "ZZZ":
            break
        assert(len(word) == 3)
        yield word

words = list(get_words())

def get_template(word):
    c1, c2, c3 = word[0], word[1], word[2]
    t1 = 1, c1, c2
    t2 = 2, c1, c3
    t3 = 3, c2, c3
    return t1, t2, t3

d = {}
for word in words:
    template = get_template(word)
    for ti in template:
        d[ti] = d.get(ti, []) + [word] #add the word to the set of words with that template

for ti in get_template('COG'):
    print d[ti]
#['COB', 'COD', 'COG', 'COL', 'CON', 'COO', 'COO', 'COP', 'COR', 'COS', 'COT', 'COW', 'COX', 'COY', 'COZ']
#['CIG', 'COG']
# ['BOG', 'COG', 'DOG', 'FOG', 'HOG', 'JOG', 'LOG', 'MOG', 'NOG', 'TOG', 'WOG']

import networkx
G = networkx.Graph()

for word_list in d.values():
    for word1 in word_list:
        for word2 in word_list:
            if word1 != word2:
                G.add_edge(word1, word2)

print G['COG']
#{'COP': {}, 'COS': {}, 'COR': {}, 'CIG': {}, 'COT': {}, 'COW': {}, 'COY': {}, 'COX': {}, 'COZ': {}, 'DOG': {}, 'CON': {}, 'COB': {}, 'COD': {}, 'COL': {}, 'COO': {}, 'LOG': {}, 'TOG': {}, 'JOG': {}, 'BOG': {}, 'HOG': {}, 'FOG': {}, 'WOG': {}, 'NOG': {}, 'MOG': {}}

print networkx.shortest_path(G, 'CAT', 'DOG')
['CAT', 'OCA', 'DOC', 'DOG']

As a bonus we can get the farthest:

print max(networkx.all_pairs_shortest_path(G, 'CAT')['CAT'].values(), key=len)
#['CAT', 'CAP', 'YAP', 'YUP', 'YUK']

janisz · Answer

At first sight I thaught about Levenshtein distance but you need to use BFS. So I think that you should start from building tree. Given word should be root and then next nodes are words with changed first letter. Next next nodes have changed second letter. When you build the graph you use BFS and when you found new word store the path length. At the end of algorithm choose minimal distance.

How to compute "shortest distance" between two words?

Tags:

algorithm

data-structures

graph-theory

Dude

2 Answers

robert king

janisz

Recent Activity

Donate For Us

How to compute "shortest distance" between two words?

Tags:

algorithm

data-structures

graph-theory

Dude

2 Answers

robert king

janisz

Related questions

Recent Activity

Donate For Us