Does anyone know of a really rock solid C++ library for suffix tries? Other than the one in Mummer?
Ideally, I'd like:
Some concept of concurrency.
Good caching behavior.
Permissive license.
Support for arbitrary alphabets.
A suffix tree is a tree data structure typically used to store a list of strings. It is also referred to as the compressed version of a trie, as, unlike a trie, each unique suffix in the list is compressed together and represented by a single node or branch in a suffix tree.
We build a suffix tree by following each suffix and creating an edge for each character, starting with a top node. If the new suffix to be put in the tree begins with a set of characters that are already in the tree, we follow those characters until we have a different one, creating a new branch.
Suffix trees and suffix arrays are data structures for representing texts that allow substring queries like "where does this pattern appear in the text" or "how many times does this pattern occur in the text" to be answered quickly.
The time taken by this algorithm is O(n^2) where 'n' is the length of the text. This time is essentially taken to build the trie. Note that this is one time activity and subsequent searches of another pattern in this text would take O(m) time where m is the length of the pattern.
Being a bioinformatician, my pick would be SeqAn (check out the sequence index section). It implements a lazy suffix tree and an enhanced suffix array (an equivalent data structure), both of which have good cache behaviour.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With