Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

memory effective search in suffix list

There is a task to build some sort of a dictionary for list of suffixes, for an instance:

[., .com., a.com., a.b.com., org., some.org., ...]

and for each incoming string, for an instance "test.some.org." find the longest suffix in the built dictionary. There are some memory limitations. What is the most proper algorithm / data structure for this case?

Most obvious choice for me is a trie for reversed strings, but it seems very memory consuming. I've tried to use suffix array but it looks like it does not fit the task.

The dictionary is immutable, it has to be built once. Are the more memory-efficient representations of immutable tries ?

like image 315
voidlizard Avatar asked Oct 21 '22 07:10

voidlizard


1 Answers

For an immutable set of string, the compressed trie works very well. The main idea is to represent a single branch in the trie as one node. There are many useful descriptions/papers of this method in the web.

like image 167
pkacprzak Avatar answered Oct 27 '22 20:10

pkacprzak