Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trie tree match performance in word search

I have debugging a few similar solutions, but wondering if we could improve Trie Tree to partial match prefix (in search method of class Trie, current search method only check if a full word is matched or not) to even improve performance, which could return from a wrong path earlier? I am not very confident for the idea, so seek for advice earlier.

I post one of the similar solutions. Thanks.


Given a 2D board and a list of words from the dictionary, find all words in the board.

Each word must be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or vertically neighboring. The same letter cell may not be used more than once in a word.

For example, Given words = ["oath","pea","eat","rain"] and board =

[
  ['o','a','a','n'],
  ['e','t','a','e'],
  ['i','h','k','r'],
  ['i','f','l','v']
]

Return ["eat","oath"]

class TrieNode():
    def __init__(self):
        self.children = collections.defaultdict(TrieNode)
        self.isWord = False

class Trie():
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for w in word:
            node = node.children[w]
        node.isWord = True

    def search(self, word):
        node = self.root
        for w in word:
            node = node.children.get(w)
            if not node:
                return False
        return node.isWord

class Solution(object):
    def findWords(self, board, words):
        res = []
        trie = Trie()
        node = trie.root
        for w in words:
            trie.insert(w)
        for i in xrange(len(board)):
            for j in xrange(len(board[0])):
                self.dfs(board, node, i, j, "", res)
        return res

    def dfs(self, board, node, i, j, path, res):
        if node.isWord:
            res.append(path)
            node.isWord = False
        if i < 0 or i >= len(board) or j < 0 or j >= len(board[0]):
            return 
        tmp = board[i][j]
        node = node.children.get(tmp)
        if not node:
            return 
        board[i][j] = "#"
        self.dfs(board, node, i+1, j, path+tmp, res)
        self.dfs(board, node, i-1, j, path+tmp, res)
        self.dfs(board, node, i, j-1, path+tmp, res)
        self.dfs(board, node, i, j+1, path+tmp, res)
        board[i][j] = tmp
like image 265
Lin Ma Avatar asked Oct 25 '15 08:10

Lin Ma


1 Answers

I don't see anything wrong from the Trie part in your code.

But I think the trie's original design already has early returning when detecting any mismatch.

Actually, I usually only use regular dict as a trie instead of defaultDict + TrieNode to avoid making the problem over-complicated. You just need to set a "#" key if a certain node is a valid word. And, during insertion, just do node[w] = {}.

If you do this, your code can be significantly simplified and early returning will be straightforward, as you will not have a "wrong" key in a node at all!

For example, a simple trie containing only 'ab' will look like: {'a': {'b': {'#': {}}}. So when you search for 'cd', as soon as you realized there is no key 'c' in the outermost dict, you can return false. This implementation is similar to yours, but I believe it's easier to understand.

like image 102
stanleyli Avatar answered Oct 17 '22 16:10

stanleyli