Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding a list of adjacent words between two words

I am working on a programming challenge for practice and am having trouble finding a good data structure/algorithm to use to implement a solution.

Background:

Call two words “adjacent” if you can change one word into the other by adding, deleting, or changing a single letter.

A “word list” is an ordered list of unique words where successive words are adjacent.

The problem:

Write a program which takes two words as inputs and walks through the dictionary and creates a list of words between them.

Examples:

hate  → love:     hate, have, hove, love
dogs  → wolves:   dogs, does, doles, soles, solves, wolves
man   → woman:    man, ran, roan, roman, woman
flour → flower:   flour, lour, dour, doer, dower, lower, flower

I am not quite sure how to approach this problem, my first attempt involved creating permutations of the first word then trying to replace letters in it. My second thought was maybe something like a suffix tree

Any thoughts or ideas toward at least breaking the problem down would be appreciated. Keep in mind that this is not homework, but a programming challenge I am working on myself.

like image 616
Bob Templ Avatar asked Apr 19 '13 02:04

Bob Templ


3 Answers

This puzzle was first stated by Charles Dodgson, who wrote Alice's Adventures in Wonderland under his pseudonym Lewis Carroll.

The basic idea is to create a graph structure in which the nodes are words in a dictionary and the edges connect words that are one letter apart, then do a breadth-first search through the graph, starting at the first word, until you find the second word.

I discuss this problem, and give an implementation that includes a clever algorithm for identifying "adjacent to" words, at my blog.

like image 185
user448810 Avatar answered Nov 14 '22 17:11

user448810


I have done this myself and used it to create a (not very good) Windows game.

I used the approach recommended by others of implementing this as a graph, where each node is a word and they are connected if they differ in one letter. This means you can use well known graph theory results to find paths between words (eg simple recursion where knowing the words at distance 1 allows you to find the words at distance 2).

The tricky part is building up the graph. The bad news is that it is O(n^2). The good news is that it doesn't have to be done in real time - rather than your program reading the dictionary words from a file, it reads in the data structure you baked earlier.

The key insight is that the order doesn't matter, in fact it gets in the way. You need to construct another form in which to hold the words which strips out the order information and allows words to be compared more easily. You can do this in O(n). You have lots of choices; I will show two.

  1. For word puzzles I quit often use an encoding which I call anagram dictionary. A word is represented by another word which has the same letters but in alphabetic sequence. So "cars" becomes "acrs". Both lists and slits become "ilsst". This is a better structure for comparison than the original word, but much better comparisons exist (however, it is a very useful structure for other word puzzles).

  2. Letter counts. An array of 26 values which show the frequency of that letter in the word. So for "cars" it starts 1,0,1,0,0... as there is one "a" and one "c". Hold an external list of the non-zero entries (which letters appear in the word) so you only have to check 5 or 6 values at most instead of 26. Very simple to compare two words held in this form quickly by ensuring at most two counts are different. This is the one I would use.

So, this is how I did it.

I wrote a program which implemented the data structure up above.

It had a class called WordNode. This contains the original word; a List of all other WordNodes which are one letter different; an array of 26 integers giving the frequency of each letter, a list of the non-zero values in the letter count array.

The initialiser populates the letter frequency array and the corresponding list of non-zero values. It sets the list of connected WordNodes to zero.

After I have created an instance of the WordNode class for every word, I run a compare method which checks to see if the frequency counts are different in no more than two places. That normally takes slightly less compares than there are letters in the words; not too bad. If they are different in exactly two places they differ by one letter, and I add that WordNode into the list of WordNodes differing in only one letter.

This means we now have a graph of all the words one letter different.

You can export either the whole data structure or strip out the letter frequency and other stuff you don't need and save it (I used serialized XML. If you go that way, make sure you check it handles the List of WordNodes as references and not embedded objects).

Your actual game then only has to read in this data structure (instead of a dictionary) and it can find the words one letter different with a direct lookup, in essentially zero time.

Pity my game was crap.

like image 41
Peter Webb Avatar answered Nov 14 '22 15:11

Peter Webb


I don't know if this is the type of solution that you're looking for, but an active area of research is in constructing "edit distance 1" dictionaries for quickly looking up adjacent words (to use your parlance) for search term suggestions, data entry correction, and bioinformatics (e.g. finding similarities in chromosomes). See for example this research paper. Short of indexing your entire dictionary, at the very least this might suggest a search heuristic that you can use.

like image 3
Zim-Zam O'Pootertoot Avatar answered Nov 14 '22 17:11

Zim-Zam O'Pootertoot