Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Autocomplete using a trie

I am working on an autocompletion script and was thinking about using a trie. My problem is I want everything that matches to be returned. So for example I type in the letter r I want all entries starting with r to be returned. Then all entries starting with re etc. Is this feasible with a trie and how would it work. Also, if there is a better way I am open to suggestions. The reason I ask is it seems like it would be complicated and a whole lot of processing to return all of the nodes off of say the r branch.

And yes I may be reinventing the wheel, but I would like to learn how it works.

like image 560
qw3n Avatar asked Feb 16 '11 22:02

qw3n


People also ask

Does Google Autocomplete use trie?

Google also stores each word/sentence in the form of a trie.

Which data structure is used for autocomplete?

Autocomplete is a feature of suggesting possible extensions to a partially written text and is widely used in search engine, code IDEs and much more. Trie data structure is a perfect fit to implement this feature efficient in terms of memory and time [O(length of string)].

Is trie faster than hashing?

The trie solution is more flexible to support more applications, such as auto-complete. Also, we can easily print all the words in the dictionary in alphabetic order with a trie. Therefore, if we want a full-text lookup application, the hash table is better as it has a faster lookup speed.


1 Answers

You can absolutely do it using a trie. Here is some code I threw together that can point you in the right direction:

var tokenTree = function (tokenArray) {
  var createLetterObject = function (l) {
    var pChildren = [];

    var getMatchingWords = function (characterArr, availableWords, children) {
        if (characterArr.length === 0) {
            for (var child in children) {
                if ({}.hasOwnProperty.call(children, child)) {
                    var currentChild = children[child];

                    var words = currentChild.getWords(characterArr);

                    for (var pos in words) {
                        if ({}.hasOwnProperty.call(words, pos)) {
                            availableWords.push(words[pos]);
                        }
                    }

                    if (currentChild.word) {
                        availableWords.push(currentChild.word);
                    }
                }
            }
        } else {
            var currentCharacter = characterArr.pop();
            getMatchingWords(characterArr, availableWords, children[currentCharacter].children);
        }
    };

    function doGetWords(wordPart) {
        var len = wordPart.length;
        var ar = [];
        var wordList = [];

        for (var ii = len - 1; ii >= 0; ii --) {
            ar.push(wordPart[ii].toUpperCase());
        }

        getMatchingWords(ar, wordList, pChildren);

        return wordList;
    }

    return {
        letter: l,
        children: pChildren,
        parent: null,
        word: null,
        getWords: doGetWords
    };
};

var startingPoint = createLetterObject();

function parseWord(wordCharacterArray, parent, fullWord) {
    if (wordCharacterArray.length === 0) {
        parent.word = fullWord;
        return;
    }

    var currentCharacter = wordCharacterArray.pop().toUpperCase();

    if (!parent.children[currentCharacter]) {
        parent.children[currentCharacter] = createLetterObject(currentCharacter);
    }

    parseWord(wordCharacterArray, parent.children[currentCharacter], fullWord);
}

for (var counter in tokenArray) {
    if ({}.hasOwnProperty.call(tokenArray, counter)) {
        var word = tokenArray[counter];

        if (!word) {
            continue;
        }

        var ar = [];

        var wordLength = word.length;

        for (var ii = wordLength - 1; ii >= 0; ii--) {
            ar.push(word[ii]);
        }

        parseWord(ar, startingPoint, word);
    }
}

  return startingPoint;
};

Usage

var tokens = ["Token", "words", "whohaa", "mommy", "test", "wicked"];
var tree = tokenTree(tokens);
var currentTokenSet = 'w'; 
var list = tree.getWords(currentTokenSet);

// it will return words,whohaa,wicked.
console.log(list) 

I'm not saying this is anywhere near the best or most efficient way, but it should at least get you pointed in the right direction.

Here is a jsfiddle showing it working: https://jsfiddle.net/es6xp8h9/

like image 85
kemiller2002 Avatar answered Sep 18 '22 14:09

kemiller2002