Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a generalized suffix tree?

I saw the Wikipedia page but still am not clear with the idea.

To find the longest common substring of 2 strings (T and S), I've read that we must build a suffix tree for the string T($1)S($2), where`($1) and ($2) are special characters not part of the strings.

But the Wikipedia image for the strings ABAB and BABA looks like this: Generalized suffix tree

Why doesn't it contain the entire string ABAB($1)BABA($2) ? Isn't it a suffix of the concatenated string?

What are those numbers on the leaves?

like image 918
batman Avatar asked Jan 01 '14 18:01

batman


1 Answers

A generalized suffix tree is a variation on a suffix tree in which the suffixes for two (or more) distinct strings T1 and T2 are stored, not just the suffixes of one string T.

One way to build a generalized suffix tree is to start by making a suffix tree for T1$1T2$2. This resulting suffix tree will contain all the suffixes of T1 and T2, but it will also contain a lot of "spurious" suffixes that start in T1 and spread into T2. To fix this, after building the initial suffix tree, you would typically make a second pass over the tree structure and eliminate any suffixes that extend past the $1 marker. This is why, for example, the generalized suffix tree you gave above doesn't contain ABAB$1BABA$2.

As to your next question - what are the numbers in the leaves? - each leaf in a suffix tree is typically tagged with the start index of the suffix that the leaf corresponds to. In a generalized suffix tree, each leaf is tagged with two pieces of information - the start index of the suffix, and which string that suffix belongs to. The notation a:b on a leaf means "this suffix comes from string a, and it starts at index b in that string." For example, the marker 1:3 on the leftmost leaf means "this suffix comes from string 1 and it starts at position 3." You can see that this corresponds to the suffix A$1, which does indeed start at position 3 in ABAB$1, assuming 1-indexing.

Hope this helps!

like image 93
templatetypedef Avatar answered Sep 28 '22 15:09

templatetypedef