Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A more efficient way of finding English words that are one letter off from eachother

Tags:

java

algorithm

I wrote a little program that tries to find a connection between two equal length English words. Word A will transform into Word B by changing one letter at a time, each newly created word has to be an English word.

For example:

Word A = BANG
Word B = DUST

Result:

BANG -> BUNG ->BUNT -> DUNT -> DUST

My process:

  1. Load an English wordlist(consist of 109582 words) into a Map<Integer, List<String>> _wordMap = new HashMap();, key will be the word length.

  2. User put in 2 words.

  3. createGraph creates a graph.

  4. calculate the shortest path between those 2 nodes

  5. prints out the result.

Everything works perfectly fine, but I am not satisfied with the time it took in step 3.

See:

Completely loaded 109582 words!
CreateMap took: 30 milsecs
CreateGraph took: 17417 milsecs
(HOISE : HORSE)
(HOISE : POISE)
(POISE : PRISE)
(ARISE : PRISE)
(ANISE : ARISE)
(ANILE : ANISE)
(ANILE : ANKLE)
The wholething took: 17866 milsecs

I am not satisfied with the time it takes create the graph in step 3, here's my code for it(I am using JgraphT for the graph):

private List<String> _wordList = new ArrayList();  // list of all 109582 English words
private Map<Integer, List<String>> _wordMap = new HashMap();  // Map grouping all the words by their length()
private UndirectedGraph<String, DefaultEdge> _wordGraph =
        new SimpleGraph<String, DefaultEdge>(DefaultEdge.class);  // Graph used to calculate the shortest path from one node to the other.


private void createGraph(int wordLength){

    long before = System.currentTimeMillis();
    List<String> words = _wordMap.get(wordLength);
    for(String word:words){
        _wordGraph.addVertex(word);  // adds a node
        for(String wordToTest : _wordList){
            if (isSimilar(word, wordToTest)) {        
                _wordGraph.addVertex(wordToTest);  // adds another node
                _wordGraph.addEdge(word, wordToTest);  // connecting 2 nodes if they are one letter off from eachother
            }
        }            
    }        

    System.out.println("CreateGraph took: " + (System.currentTimeMillis() - before)+ " milsecs");
}


private boolean isSimilar(String wordA, String wordB) {
    if(wordA.length() != wordB.length()){
        return false;
    }        

    int matchingLetters = 0;
    if (wordA.equalsIgnoreCase(wordB)) {
        return false;
    }
    for (int i = 0; i < wordA.length(); i++) {

        if (wordA.charAt(i) == wordB.charAt(i)) {
            matchingLetters++;
        }
    }
    if (matchingLetters == wordA.length() - 1) {
        return true;
    }
    return false;
}

My question:

How can I improve my algorithm inorder to speed up the process?

For any redditors that are reading this, yes I created this after seeing the thread from /r/askreddit yesterday.

like image 281
16dots Avatar asked Nov 10 '12 18:11

16dots


1 Answers

Here's a starting thought:

Create a Map<String, List<String>> (or a Multimap<String, String> if you've using Guava), and for each word, "blank out" one letter at a time, and add the original word to the list for that blanked out word. So you'd end up with:

.ORSE => NORSE, HORSE, GORSE (etc)
H.RSE => HORSE
HO.SE => HORSE, HOUSE (etc)

At that point, given a word, you can very easily find all the words it's similar to - just go through the same process again, but instead of adding to the map, just fetch all the values for each "blanked out" version.

like image 77
Jon Skeet Avatar answered Sep 23 '22 07:09

Jon Skeet