I am implementing a Trie in python. Till now I have come across two different methods to implement it:
char
- to store characteris_end
- to store end of word (true or false)prefix_count
- store number of words with current prefix
child
- Node type dict (to store other nodes i.e. for 26 alphabets)
class Node(object):
def __init__(self):
self.char = ''
self.word = ''
self.is_end = False
self.prefix_count = 0
self.child = {}
e.g. for the input words = {'foo', 'bar', 'baz', 'barz'}
we derive this dictionary:
{'b': {'a': {'r': {'_end_': '_end_', 'z': {'_end_': '_end_'}},
'z': {'_end_': '_end_'}}},
'f': {'o': {'o': {'_end_': '_end_'}}}}
Which is the efficient and standard data structure, which is both memory efficient and fast for traversal and other trie operations on big dataset of words?
Why not both? Just yesterday I was implementing a similar data structure for storing and retrieving a hierarchy of objects and contemplated this exact situation. Ended up using a Node object with a children dictionary. The main node being an object allows you to have custom methods for printing it or getting stuff, and you can even have a lazy initialization if needed (you mentioned big dataset right?)
class Node(object):
def __init__(self):
self.children = collections.defaultdict(lambda:self.__class__())
self.stuff = ...
def __str__(self,depth=0):
if self.children:
return str(self.stuff)+'\n'+'\n'.join([' '*depth+k+' ' + v.__str__(depth+1) for k,v in self.children.items()])
else:
return str(self.stuff)
def get_path(self,path):
node = self
while path:
node = node.children[path.pop(0)]
...
A direct replacement would be nested a list
;
However [arguably] more Pythonic, more compact in memory and thus faster for lookup would be a nested tuple
.
Of course, updating such trie becomes logN, as you'll have to recreate every ancestor node. Still, if your lookups are a lot more frequent than updates, it may be worth it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With