Memory Efficient data structure for Trie Implementation

Question

I am implementing a Trie in python. Till now I have come across two different methods to implement it:

1) use a class Node (similar to struct Node in C++) with data members:

char - to store character
is_end - to store end of word (true or false)
prefix_count - store number of words with current prefix

child - Node type dict (to store other nodes i.e. for 26 alphabets)

class Node(object):
    def __init__(self):
        self.char = ''
        self.word = ''
        self.is_end = False
        self.prefix_count = 0
        self.child = {}

2) use a dictionary to store all the data:

e.g. for the input words = {'foo', 'bar', 'baz', 'barz'}

we derive this dictionary:

         {'b': {'a': {'r': {'_end_': '_end_', 'z': {'_end_': '_end_'}},
          'z': {'_end_': '_end_'}}},
          'f': {'o': {'o': {'_end_': '_end_'}}}}

Which is the efficient and standard data structure, which is both memory efficient and fast for traversal and other trie operations on big dataset of words?

Josep Valls · Accepted Answer

Why not both? Just yesterday I was implementing a similar data structure for storing and retrieving a hierarchy of objects and contemplated this exact situation. Ended up using a Node object with a children dictionary. The main node being an object allows you to have custom methods for printing it or getting stuff, and you can even have a lazy initialization if needed (you mentioned big dataset right?)

class Node(object):
    def __init__(self):
        self.children = collections.defaultdict(lambda:self.__class__())
        self.stuff = ...
    def __str__(self,depth=0):
        if self.children:
            return str(self.stuff)+'
'+'
'.join([' '*depth+k+' ' + v.__str__(depth+1) for k,v in self.children.items()])
        else:
            return str(self.stuff)
    def get_path(self,path):
        node = self
        while path:
            node = node.children[path.pop(0)]
        ...

Dima Tisnek · Answer

A direct replacement would be nested a list;

However [arguably] more Pythonic, more compact in memory and thus faster for lookup would be a nested tuple.

Of course, updating such trie becomes logN, as you'll have to recreate every ancestor node. Still, if your lookups are a lot more frequent than updates, it may be worth it.

Memory Efficient data structure for Trie Implementation

Tags:

python

data-structures

tree

nlp

trie

1) use a class Node (similar to struct Node in C++) with data members:

2) use a dictionary to store all the data:

divyum

2 Answers

Josep Valls

Dima Tisnek

Recent Activity

Donate For Us

Memory Efficient data structure for Trie Implementation

Tags:

python

data-structures

tree

nlp

trie

1) use a class Node (similar to struct Node in C++) with data members:

2) use a dictionary to store all the data:

divyum

2 Answers

Josep Valls

Dima Tisnek

Related questions

Recent Activity

Donate For Us