I'm writing an application that needs to read a list of strings from a file, save them in a data structure, and then look up those strings by prefixes. The list of strings is simply a list of words in a given language. For example, if the search function gets "stup" as a parameter, it should return ["stupid", "stupidity", "stupor"...]. It should do so in O(log(n)*m) time, where n is the size of the data structure and m is the number of results and should be as fast as possible. Memory consumption is not a big issue right now. I'm writing this in python, so it would be great if you could point me to a suitable data structure (preferably) implemented in c with python wrappers.
The best data structure for faster searching of string is TRIE. Tries are an extremely special and useful data-structure that are based on the prefix of a string. They are used to represent the “Retrieval” of data. A Trie is a special data structure used to store strings that can be visualized like a graph.
Arrays. An array is a linear data structure that holds an ordered collection of values. It's the most efficient in storing and accessing a sequence of objects.
2 Answers. i think linked list would be better ,it will take constant space and O(n) time complexity.
You want a trie.
http://en.wikipedia.org/wiki/Trie
I've used them in Scrabble and Boggle programs. They're perfect for the use case you described (fast prefix lookup).
Here's some sample code for building up a trie in Python. This is from a Boggle program I whipped together a few months ago. The rest is left as an exercise to the reader. But for prefix checking you basically need a method that starts at the root node (the variable words
), follows the letters of the prefix to successive child nodes, and returns True if such a path is found and False otherwise.
class Node(object):
def __init__(self, letter='', final=False):
self.letter = letter
self.final = final
self.children = {}
def __contains__(self, letter):
return letter in self.children
def get(self, letter):
return self.children[letter]
def add(self, letters, n=-1, index=0):
if n < 0: n = len(letters)
if index >= n: return
letter = letters[index]
if letter in self.children:
child = self.children[letter]
else:
child = Node(letter, index==n-1)
self.children[letter] = child
child.add(letters, n, index+1)
def load_dictionary(path):
result = Node()
for line in open(path, 'r'):
word = line.strip().lower()
result.add(word)
return result
words = load_dictionary('dictionary.txt')
Some Python implementations of tries:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With