Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shortest Path between two Trie Nodes

This is a twofold question, because I'm out of ideas on how to implement this most efficiently.

I have a dictionary of 150,000 words, stored into a Trie implementation, here's what my particular implementation looks like: trie-graph

A user is given a provided with two words. With the goal being to find the shortest path of other english words (changed by one character apiece) from the start word to the end word.

For example:

Start: Dog

End: Cat

Path: Dog, Dot, Cot, Cat

Path: Dog, Cog, Log, Bog, Bot, Cot, Cat

Path: Dog, Doe, Joe, Joy, Jot, Cot, Cat


My current implementation has gone through several iterations, but the simplest I can provide pseudocode for (as the actual code is several files):

var start = "dog";
var end = "cat";
var alphabet = [a, b, c, d, e .... y, z];
var possible_words = [];

for (var letter_of_word = 0; letter_of_word < start.length; letter_of_word++) {
  for (var letter_of_alphabet = 0; letter_of_alphabet < alphabet.length; letter_of_alphabet++) {
      var new_word = start;
      new_word.characterAt(letter_of_word) = alphabet[letter_of_alphabet];
      if (in_dictionary(new_word)) {
          add_to.possible_words;
      }
  }  
}

function bfs() {
    var q = [];
    ... usual bfs implementation here ..
}

Knowns:

  • A start word and a finish word
  • Words are of the same length
  • Words are english words
  • It is possible for there to not be a path


Question:

My issue is I do not have an efficient way of determining a potential word to try without brute-forcing the alphabet and checking each new word against the dictionary. I know there is a possibility of a more efficient way using prefixes, but I can't figure out a proper implementation, or one that doesn't just double the processing.

Secondly, should I be using a different search algorithm, I've looked at A* and Best First Search as possibilities, but those require weights, which I don't have.

Thoughts?

like image 602
acupofjose Avatar asked Dec 01 '15 04:12

acupofjose


2 Answers

As requested in comments, illustrating what I mean by encoding linked words in the bits of integers.

In C++, it might look something like...

// populate a list of known words (or read from file etc)...
std::vector<std::string> words = {
    "dog", "dot", "cot", "cat", "log", "bog"
};

// create sets of one-letter-apart words...
std::unordered_map<std::string, int32_t> links;
for (auto& word : words)
    for (int i = 0; i < word.size(); ++i)
    {
        char save = word[i];
        word[i] = '_';
        links[word] |= 1 << (save - 'a');
        word[i] = save;
    }

After the above code runs, links[x] - where x is a word with one letter replaced with an underscore a la d_g - retrieves an integer indicating the letters that can replace the underscore to form known words. If the least significant bit is on, then 'dag' is a known word, if the next-from-least-significant bit is on, then 'dbg' is known word etc..

Intuitively I'd expect using integers to reduce the overall memory used for linkage data, but if the majority of words only have a couple linked words each, storing some index or pointer to those words may actually use less memory - and be easier if you're unused to bitwise manipulations, i.e.:

std::unordered_map<std::string, std::vector<const char*>> links;
for (auto& word : words)
    for (int i = 0; i < word.size(); ++i)
    {
        char save = word[i];
        word[i] = '_';
        links[word].push_back(word.c_str());
        word[i] = save;
    }

Either way, you then have a graph linking each word to those it can transform into with single-character changes. You can then apply the logic of Dijkstra's algorithm to find a shortest path between any two words.

like image 101
Tony Delroy Avatar answered Nov 11 '22 02:11

Tony Delroy


Just to add an update for those that starred this question, I've added a Github repository for an implementation in Javascript for this particular Data-structure.

https://github.com/acupajoe/Lexibit.js

Thank you all for the help and ideas!

like image 26
acupofjose Avatar answered Nov 11 '22 03:11

acupofjose