For a Data Structures project, I must find the shortest path between two words (like <code>"cat"</code> and <code>"dog"</code>), changing only one letter at a time. We are given a Scrabble word list to use in finding our path. For example: <pre class="prettyprint"><code>cat -> bat -> bet -> bot -> bog -> dog </code></pre> I've solved the problem using a breadth first search, but am seeking something better (I represented the dictionary with a trie). Please give me some ideas for a more efficient method (in terms of speed and memory). Something ridiculous and/or challenging is preferred. I asked one of my friends (he's a junior) and he said that there is no efficient solution to this problem. He said I would learn why when I took the algorithms course. Any comments on that? We must move from word to word. We cannot go <code>cat -> dat -> dag -> dog</code>. We also have to print out the traversal.

With a dictionary, BFS is optimal, but the running time needed is proportional to its size (V+E). With n letters, the dictionary might have ~a^n entires, where a is alphabet size. If the dictionary contains all words but the one that should be on the end of chain, then you'll traverse all possible words but won't find anything. This is graph traversal, but the size might be exponentially large. You may wonder if it is possible to do it faster - to browse the structure "intelligently" and do it in polynomial time. The answer is, I think, no. The problem: You're given a fast (linear) way to check if a word is in dictionary, two words u, v and are to check if there's a sequence u -> a1 -> a2 -> ... -> an -> v. is NP-hard. Proof: Take some 3SAT instance, like (p or q or not r) and (p or not q or r) You'll start with 0 000 00 and are to check if it is possible to go to 2 222 22. The first character will be "are we finished", three next bits will control p,q,r and two next will control clauses. Allowed words are: <ul> <li>Anything that starts with 0 and contains only 0's and 1's</li> <li>Anything that starts with 2 and is legal. This means that it consists of 0's and 1's (except that the first character is 2, all clauses bits are rightfully set according to variables bits, and they're set to 1 (so this shows that the formula is satisfable).</li> <li>Anything that starts with at least two 2's and then is composed of 0's and 1's (regular expression: 222* (0+1)*, like 22221101 but not 2212001</li> </ul> To produce 2 222 22 from 0 000 00, you have to do it in this way: (1) Flip appropriate bits - e.g. 0 100 111 in four steps. This requires finding a 3SAT solution. (2) Change the first bit to 2: 2 100 111. Here you'll be verified this is indeed a 3SAT solution. (3) Change 2 100 111 -> 2 200 111 -> 2 220 111 -> 2 222 111 -> 2 222 211 -> 2 222 221 -> 2 222 222. These rules enforce that you can't cheat (check). Going to 2 222 22 is possible only if the formula is satisfable, and checking that is NP-hard. I feel it might be even harder (#P or FNP probably) but NP-hardness is enough for that purpose I think. Edit: You might be interested in disjoint set data structure. This will take your dictionary and group words that can be reached from each other. You can also store a path from every vertex to root or some other vertex. This will give you a path, not neccessarily the shortest one.

Shortest path to transform one word into another

Q: What is a word ladder example?

For example, start with CAT. Replacing one letter at a time, the ladder for cat can become: cat – cot – dot – dog. This is a word ladder that starts at "cat" and ends at "dog."

Tags:

algorithm

shortest-path

edit-distance

hamming-distance

For a Data Structures project, I must find the shortest path between two words (like "cat" and "dog"), changing only one letter at a time. We are given a Scrabble word list to use in finding our path. For example:

cat -> bat -> bet -> bot -> bog -> dog

I've solved the problem using a breadth first search, but am seeking something better (I represented the dictionary with a trie).

Please give me some ideas for a more efficient method (in terms of speed and memory). Something ridiculous and/or challenging is preferred.

I asked one of my friends (he's a junior) and he said that there is no efficient solution to this problem. He said I would learn why when I took the algorithms course. Any comments on that?

We must move from word to word. We cannot go cat -> dat -> dag -> dog. We also have to print out the traversal.

344

asked Oct 05 '09 19:10

dacman

2 Answers

NEW ANSWER

Given the recent update, you could try A* with the Hamming distance as a heuristic. It's an admissible heuristic since it's not going to overestimate the distance

OLD ANSWER

You can modify the dynamic-program used to compute the Levenshtein distance to obtain the sequence of operations.

EDIT: If there are a constant number of strings, the problem is solvable in polynomial time. Else, it's NP-hard (it's all there in wikipedia) .. assuming your friend is talking about the problem being NP-hard.

EDIT: If your strings are of equal length, you can use Hamming distance.

answered Sep 22 '22 07:09

Jacob

With a dictionary, BFS is optimal, but the running time needed is proportional to its size (V+E). With n letters, the dictionary might have ~a^n entires, where a is alphabet size. If the dictionary contains all words but the one that should be on the end of chain, then you'll traverse all possible words but won't find anything. This is graph traversal, but the size might be exponentially large.

You may wonder if it is possible to do it faster - to browse the structure "intelligently" and do it in polynomial time. The answer is, I think, no.

The problem:

You're given a fast (linear) way to check if a word is in dictionary, two words u, v and are to check if there's a sequence u -> a₁ -> a₂ -> ... -> a_n -> v.

is NP-hard.

Proof: Take some 3SAT instance, like

(p or q or not r) and (p or not q or r)

You'll start with 0 000 00 and are to check if it is possible to go to 2 222 22.

The first character will be "are we finished", three next bits will control p,q,r and two next will control clauses.

Allowed words are:

Anything that starts with 0 and contains only 0's and 1's
Anything that starts with 2 and is legal. This means that it consists of 0's and 1's (except that the first character is 2, all clauses bits are rightfully set according to variables bits, and they're set to 1 (so this shows that the formula is satisfable).
Anything that starts with at least two 2's and then is composed of 0's and 1's (regular expression: 222* (0+1)*, like 22221101 but not 2212001

To produce 2 222 22 from 0 000 00, you have to do it in this way:

(1) Flip appropriate bits - e.g. 0 100 111 in four steps. This requires finding a 3SAT solution.

(2) Change the first bit to 2: 2 100 111. Here you'll be verified this is indeed a 3SAT solution.

(3) Change 2 100 111 -> 2 200 111 -> 2 220 111 -> 2 222 111 -> 2 222 211 -> 2 222 221 -> 2 222 222.

These rules enforce that you can't cheat (check). Going to 2 222 22 is possible only if the formula is satisfable, and checking that is NP-hard. I feel it might be even harder (#P or FNP probably) but NP-hardness is enough for that purpose I think.

Edit: You might be interested in disjoint set data structure. This will take your dictionary and group words that can be reached from each other. You can also store a path from every vertex to root or some other vertex. This will give you a path, not neccessarily the shortest one.

answered Sep 23 '22 07:09

sdcvvc

Related questions
                            
                                Detection of rectangular bright area in a Image using OpenCv
                            
                                Program to find all primes in a very large given range of integers
                            
                                Looking for a C++ implementation of the C4.5 algorithm
                            
                                Finding the minimum distance in a table
                            
                                Finding all intervals (overlapping and nonoverlapping) in overlapping intervals
                            
                                Importance of Algorithms in context of Mobile Application Development? [closed]
                            
                                Algorithm to see if regex repeat is reducible
                            
                                How to merge two finite state automata?
                            
                                Algorithm to find rectangles
                            
                                Minimizing distance to a weighted grid
                            
                                Solve the word game Ghost (as seen on xkcd) - spelling letters without making a word
                            
                                Adjusting the threshold in Canny edge algorithm
                            
                                Merging sequence of symbols
                            
                                finding saddle points in 3d heightmap
                            
                                Finding the minimum unique number in an array
                            
                                what is meant by symmetric DDA?
                            
                                What is this pattern/algo called? Getting a random order of subscribers to an event that only one can react to at a time
                            
                                Longest Common Palindromic Subsequence
                            
                                Discrete fluid "filling" algorithm for a height map
                            
                                The max product of consecutive elements in an array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With