A more efficient way of finding English words that are one letter off from eachother

Question

I wrote a little program that tries to find a connection between two equal length English words. Word A will transform into Word B by changing one letter at a time, each newly created word has to be an English word.

For example:

Word A = BANG
Word B = DUST

Result:

BANG -> BUNG ->BUNT -> DUNT -> DUST

My process:

Load an English wordlist(consist of 109582 words) into a Map<Integer, List<String>> _wordMap = new HashMap();, key will be the word length.
User put in 2 words.
createGraph creates a graph.
calculate the shortest path between those 2 nodes
prints out the result.

Everything works perfectly fine, but I am not satisfied with the time it took in step 3.

See:

Completely loaded 109582 words!
CreateMap took: 30 milsecs
CreateGraph took: 17417 milsecs
(HOISE : HORSE)
(HOISE : POISE)
(POISE : PRISE)
(ARISE : PRISE)
(ANISE : ARISE)
(ANILE : ANISE)
(ANILE : ANKLE)
The wholething took: 17866 milsecs

I am not satisfied with the time it takes create the graph in step 3, here's my code for it(I am using JgraphT for the graph):

private List<String> _wordList = new ArrayList();  // list of all 109582 English words
private Map<Integer, List<String>> _wordMap = new HashMap();  // Map grouping all the words by their length()
private UndirectedGraph<String, DefaultEdge> _wordGraph =
        new SimpleGraph<String, DefaultEdge>(DefaultEdge.class);  // Graph used to calculate the shortest path from one node to the other.


private void createGraph(int wordLength){

    long before = System.currentTimeMillis();
    List<String> words = _wordMap.get(wordLength);
    for(String word:words){
        _wordGraph.addVertex(word);  // adds a node
        for(String wordToTest : _wordList){
            if (isSimilar(word, wordToTest)) {        
                _wordGraph.addVertex(wordToTest);  // adds another node
                _wordGraph.addEdge(word, wordToTest);  // connecting 2 nodes if they are one letter off from eachother
            }
        }            
    }        

    System.out.println("CreateGraph took: " + (System.currentTimeMillis() - before)+ " milsecs");
}


private boolean isSimilar(String wordA, String wordB) {
    if(wordA.length() != wordB.length()){
        return false;
    }        

    int matchingLetters = 0;
    if (wordA.equalsIgnoreCase(wordB)) {
        return false;
    }
    for (int i = 0; i < wordA.length(); i++) {

        if (wordA.charAt(i) == wordB.charAt(i)) {
            matchingLetters++;
        }
    }
    if (matchingLetters == wordA.length() - 1) {
        return true;
    }
    return false;
}

My question:

How can I improve my algorithm inorder to speed up the process?

For any redditors that are reading this, yes I created this after seeing the thread from /r/askreddit yesterday.

Jon Skeet · Accepted Answer

Here's a starting thought:

Create a Map<String, List<String>> (or a Multimap<String, String> if you've using Guava), and for each word, "blank out" one letter at a time, and add the original word to the list for that blanked out word. So you'd end up with:

.ORSE => NORSE, HORSE, GORSE (etc)
H.RSE => HORSE
HO.SE => HORSE, HOUSE (etc)

At that point, given a word, you can very easily find all the words it's similar to - just go through the same process again, but instead of adding to the map, just fetch all the values for each "blanked out" version.

A more efficient way of finding English words that are one letter off from eachother

Tags:

java

algorithm

16dots

1 Answers

Jon Skeet

Recent Activity

Donate For Us

A more efficient way of finding English words that are one letter off from eachother

Tags:

java

algorithm

16dots

1 Answers

Jon Skeet

Related questions

Recent Activity

Donate For Us