Shortest Path between two Trie Nodes

Question

This is a twofold question, because I'm out of ideas on how to implement this most efficiently.

I have a dictionary of 150,000 words, stored into a Trie implementation, here's what my particular implementation looks like: trie-graph

A user is given a provided with two words. With the goal being to find the shortest path of other english words (changed by one character apiece) from the start word to the end word.

For example:

Start: Dog

End: Cat

Path: Dog, Dot, Cot, Cat

Path: Dog, Cog, Log, Bog, Bot, Cot, Cat

Path: Dog, Doe, Joe, Joy, Jot, Cot, Cat

My current implementation has gone through several iterations, but the simplest I can provide pseudocode for (as the actual code is several files):

var start = "dog";
var end = "cat";
var alphabet = [a, b, c, d, e .... y, z];
var possible_words = [];

for (var letter_of_word = 0; letter_of_word < start.length; letter_of_word++) {
  for (var letter_of_alphabet = 0; letter_of_alphabet < alphabet.length; letter_of_alphabet++) {
      var new_word = start;
      new_word.characterAt(letter_of_word) = alphabet[letter_of_alphabet];
      if (in_dictionary(new_word)) {
          add_to.possible_words;
      }
  }  
}

function bfs() {
    var q = [];
    ... usual bfs implementation here ..
}

Knowns:

A start word and a finish word
Words are of the same length
Words are english words
It is possible for there to not be a path

Question:

My issue is I do not have an efficient way of determining a potential word to try without brute-forcing the alphabet and checking each new word against the dictionary. I know there is a possibility of a more efficient way using prefixes, but I can't figure out a proper implementation, or one that doesn't just double the processing.

Secondly, should I be using a different search algorithm, I've looked at A* and Best First Search as possibilities, but those require weights, which I don't have.

Thoughts?

Tony Delroy · Accepted Answer

As requested in comments, illustrating what I mean by encoding linked words in the bits of integers.

In C++, it might look something like...

// populate a list of known words (or read from file etc)...
std::vector<std::string> words = {
    "dog", "dot", "cot", "cat", "log", "bog"
};

// create sets of one-letter-apart words...
std::unordered_map<std::string, int32_t> links;
for (auto& word : words)
    for (int i = 0; i < word.size(); ++i)
    {
        char save = word[i];
        word[i] = '_';
        links[word] |= 1 << (save - 'a');
        word[i] = save;
    }

After the above code runs, links[x] - where x is a word with one letter replaced with an underscore a la d_g - retrieves an integer indicating the letters that can replace the underscore to form known words. If the least significant bit is on, then 'dag' is a known word, if the next-from-least-significant bit is on, then 'dbg' is known word etc..

Intuitively I'd expect using integers to reduce the overall memory used for linkage data, but if the majority of words only have a couple linked words each, storing some index or pointer to those words may actually use less memory - and be easier if you're unused to bitwise manipulations, i.e.:

std::unordered_map<std::string, std::vector<const char*>> links;
for (auto& word : words)
    for (int i = 0; i < word.size(); ++i)
    {
        char save = word[i];
        word[i] = '_';
        links[word].push_back(word.c_str());
        word[i] = save;
    }

Either way, you then have a graph linking each word to those it can transform into with single-character changes. You can then apply the logic of Dijkstra's algorithm to find a shortest path between any two words.

acupofjose · Answer

Just to add an update for those that starred this question, I've added a Github repository for an implementation in Javascript for this particular Data-structure.

https://github.com/acupajoe/Lexibit.js

Thank you all for the help and ideas!

Shortest Path between two Trie Nodes

Tags:

performance

python

javascript

algorithm

sorting

Question:

acupofjose

2 Answers

Tony Delroy

acupofjose

Recent Activity

Donate For Us

Shortest Path between two Trie Nodes

Tags:

performance

python

javascript

algorithm

sorting

Question:

acupofjose

2 Answers

Tony Delroy

acupofjose

Related questions

Recent Activity

Donate For Us