How to create a trie in Python

People also ask

Is there a trie in python?

To implement a trie, we can first create a TrieNode class, which can be used to represent a node in the trie. Below is how this class can be implemented. In this implementation, we want to store also the number of times a word has been inserted into the trie.

How do you declare a trie?

If the input key is new or an extension of the existing key, we need to construct non-existing nodes of the key, and mark the end of the word for the last node. If the input key is a prefix of the existing key in Trie, we simply mark the last node of the key as the end of a word. The key length determines Trie depth.

What is trie in coding?

In computer science, a trie, also called digital tree or prefix tree, is a type of k-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes defined not by the entire key, but by individual characters.

Unwind is essentially correct that there are many different ways to implement a trie; and for a large, scalable trie, nested dictionaries might become cumbersome -- or at least space inefficient. But since you're just getting started, I think that's the easiest approach; you could code up a simple trie in just a few lines. First, a function to construct the trie:

>>> _end = '_end_'
>>> 
>>> def make_trie(*words):
...     root = dict()
...     for word in words:
...         current_dict = root
...         for letter in word:
...             current_dict = current_dict.setdefault(letter, {})
...         current_dict[_end] = _end
...     return root
... 
>>> make_trie('foo', 'bar', 'baz', 'barz')
{'b': {'a': {'r': {'_end_': '_end_', 'z': {'_end_': '_end_'}}, 
             'z': {'_end_': '_end_'}}}, 
 'f': {'o': {'o': {'_end_': '_end_'}}}}

If you're not familiar with setdefault, it simply looks up a key in the dictionary (here, letter or _end). If the key is present, it returns the associated value; if not, it assigns a default value to that key and returns the value ({} or _end). (It's like a version of get that also updates the dictionary.)

Next, a function to test whether the word is in the trie:

>>> def in_trie(trie, word):
...     current_dict = trie
...     for letter in word:
...         if letter not in current_dict:
...             return False
...         current_dict = current_dict[letter]
...     return _end in current_dict
... 
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'baz')
True
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'barz')
True
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'barzz')
False
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'bart')
False
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'ba')
False

I'll leave insertion and removal to you as an exercise.

Of course, Unwind's suggestion wouldn't be much harder. There might be a slight speed disadvantage in that finding the correct sub-node would require a linear search. But the search would be limited to the number of possible characters -- 27 if we include _end. Also, there's nothing to be gained by creating a massive list of nodes and accessing them by index as he suggests; you might as well just nest the lists.

Finally, I'll add that creating a directed acyclic word graph (DAWG) would be a bit more complex, because you have to detect situations in which your current word shares a suffix with another word in the structure. In fact, this can get rather complex, depending on how you want to structure the DAWG! You may have to learn some stuff about Levenshtein distance to get it right.

Have a look at this:

https://github.com/kmike/marisa-trie

Static memory-efficient Trie structures for Python (2.x and 3.x).

String data in a MARISA-trie may take up to 50x-100x less memory than in a standard Python dict; the raw lookup speed is comparable; trie also provides fast advanced methods like prefix search.

Based on marisa-trie C++ library.

Here's a blog post from a company using marisa trie successfully:
https://www.repustate.com/blog/sharing-large-data-structure-across-processes-python/

At Repustate, much of our data models we use in our text analysis can be represented as simple key-value pairs, or dictionaries in Python lingo. In our particular case, our dictionaries are massive, a few hundred MB each, and they need to be accessed constantly. In fact for a given HTTP request, 4 or 5 models might be accessed, each doing 20-30 lookups. So the problem we face is how do we keep things fast for the client as well as light as possible for the server.

...

I found this package, marisa tries, which is a Python wrapper around a C++ implementation of a marisa trie. “Marisa” is an acronym for Matching Algorithm with Recursively Implemented StorAge. What’s great about marisa tries is the storage mechanism really shrinks how much memory you need. The author of the Python plugin claimed 50-100X reduction in size – our experience is similar.

What’s great about the marisa trie package is that the underlying trie structure can be written to disk and then read in via a memory mapped object. With a memory mapped marisa trie, all of our requirements are now met. Our server’s memory usage went down dramatically, by about 40%, and our performance was unchanged from when we used Python’s dictionary implementation.

There are also a couple of pure-python implementations, though unless you're on a restricted platform you'd want to use the C++ backed implementation above for best performance:

https://github.com/bdimmick/python-trie
https://pypi.python.org/pypi/PyTrie

Here is a list of python packages that implement Trie:

marisa-trie - a C++ based implementation.
python-trie - a simple pure python implementation.
PyTrie - a more advanced pure python implementation.
pygtrie - a pure python implementation by Google.
datrie - a double array trie implementation based on libdatrie.

Modified from senderle's method (above). I found that Python's defaultdict is ideal for creating a trie or a prefix tree.

from collections import defaultdict

class Trie:
    """
    Implement a trie with insert, search, and startsWith methods.
    """
    def __init__(self):
        self.root = defaultdict()

    # @param {string} word
    # @return {void}
    # Inserts a word into the trie.
    def insert(self, word):
        current = self.root
        for letter in word:
            current = current.setdefault(letter, {})
        current.setdefault("_end")

    # @param {string} word
    # @return {boolean}
    # Returns if the word is in the trie.
    def search(self, word):
        current = self.root
        for letter in word:
            if letter not in current:
                return False
            current = current[letter]
        if "_end" in current:
            return True
        return False

    # @param {string} prefix
    # @return {boolean}
    # Returns if there is any word in the trie
    # that starts with the given prefix.
    def startsWith(self, prefix):
        current = self.root
        for letter in prefix:
            if letter not in current:
                return False
            current = current[letter]
        return True

# Now test the class

test = Trie()
test.insert('helloworld')
test.insert('ilikeapple')
test.insert('helloz')

print test.search('hello')
print test.startsWith('hello')
print test.search('ilikeapple')

Related questions
                            
                                how do you filter pandas dataframes by multiple columns
                            
                                How to hash a string into 8 digits?
                            
                                sql "LIKE" equivalent in django query
                            
                                Bundling data files with PyInstaller (--onefile)
                            
                                Getting user input [duplicate]
                            
                                How do I get a raw, compiled SQL query from a SQLAlchemy expression?
                            
                                GroupBy pandas DataFrame and select most common value
                            
                                Round a floating-point number down to the nearest integer?
                            
                                Finding local maxima/minima with Numpy in a 1D numpy array
                            
                                Why does Python print unicode characters when the default encoding is ASCII?
                            
                                How to document class attributes in Python?
                            
                                Does python have a sorted list?
                            
                                Easiest way to rename a model using Django/South?
                            
                                How can I save an image with PIL?
                            
                                How can I create a Set of Sets in Python?
                            
                                What is exactly sklearn.pipeline.Pipeline?
                            
                                Check if argparse optional argument is set or not
                            
                                How to display full output in Jupyter, not only last result?
                            
                                Getting the caller function name inside another function in Python? [duplicate]
                            
                                How to stop flask application without using ctrl-c

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to create a trie in Python

Tags:

python

trie

dawg

People also ask

Recent Activity

Donate For Us