Could anyone give me an example about how and when to create a suffix link in suffix tree? If my string is <code>ABABABC</code>, but do use a different example if that is better. Hope to give some pictures to illustrate every step. very appreciate.

To understand this, first recall that there are three kinds of nodes in a suffix tree: <ul> <li>The root</li> <li>Internal nodes</li> <li>Leaf nodes</li> </ul> In the graph below, which is the suffix tree for <code>ABABABC</code>, the yellow circle is the root, the grey, blue and green ones are internal nodes, and the small black ones are leaves. <img src="https://i.stack.imgur.com/QMNzA.png" alt=""> There are two important things to notice: <ul> <li>Internal nodes always have more than 1 outgoing edge. That is, internal nodes mark those parts of the tree where branching occurs.</li> <li>Branching occurs wherever a repeated string is involved, and only there. For any internal node X, the string leading from the root to X must have occurred in the input string at least as many times as there are outgoing edges from X.</li> </ul> Example: The string leading to the blue node is <code>ABAB</code>. Indeed, this string appears twice in the input string: At position 0 and at position 2. And that is why the blue node exists. Now about suffix links: <ol> <li> If the string s leading up to some internal node X is longer than 1 character, the same string minus the first character (call this s-1) must be in the tree, too (it's a suffix tree, after all, so the suffix of any of its strings must be in the tree, too). Example: Let s=<code>ABAB</code>, the string leading to the blue node. Then after removing the first character, s-1 is <code>BAB</code>. And indeed that string is found in the tree, too. It leads to the green node. </li> <li>If some string s leads to an internal node, its shortened version s-1must lead to an internal node (call it X-1) as well. Why? Because s must appear at least twice in the input string, so s-1 must appear at least as many times (because it is part of s: wherever s appears, s-1 must appear, too). But if s-1 appears multiple times in the input string, then there must be an internal node for it.</li> <li>In any such situation, a special link connecting X to X-1 is a suffix link.</li> </ol> Note: Because of (1) and (2) above, every internal node X that has a label from root to X of more than 1 character must have a suffix link to exactly one other internal node. Example: <img src="https://i.stack.imgur.com/Hzl2w.png" alt=""> This is the same suffix tree as before; the dotted lines indicate the suffix links. If you start at the blue node and follow the suffix links from there (from blue, to green, to first gray, to second gray), and look at the strings leading from the root to each node, you will see this: <pre class="prettyprint"><code> ABAB -> BAB -> AB -> B (blue) (green) (gray1) (gray2) </code></pre> This is why they are called suffix links (the entire sequence is called suffix chain). What are they good for? They are good for surprisingly many things. However, they play a particular role in <a href="https://stackoverflow.com/a/9513423/777186">Ukkonen's algorithm for suffix tree construction</a>, specifically in Rule 3 described there: After inserting a the final character of some suffix s at some point X, the algorithm needs to insert the final character of suffix s-1 in O(1) time. In order to do that, it uses the suffix link to jump right to the place X-1 and makes the insert. But, note that there is no necessity to put suffix links in a suffix tree. They are not part of the definition of a suffix tree — they are just special links used by some algorithms that construct or use suffix trees. <hr> Regarding single-character nodes: What if there is an internal node X whose string (i.e. the string on the path from root to X) consists of only one character? By the definition above, X then does not have a suffix link. You can however assume that if it had a suffix link, it would point to the root node. Furthermore: If, by the definition above, an internal node does not have a suffix link, it must be a single-character node, so you can always assume that if no suffix link is present at an internal node it must be a single-character node, and therefore, the node that represents the s-1 suffix is the root node. (Some algorithms may actually add an explicit suffix link pointing to the root node in this case.) Thanks to j_random_hacker for the comment about this.

How and when to create a suffix link in suffix tree?

1 Answers

To understand this, first recall that there are three kinds of nodes in a suffix tree:

The root
Internal nodes
Leaf nodes

In the graph below, which is the suffix tree for ABABABC, the yellow circle is the root, the grey, blue and green ones are internal nodes, and the small black ones are leaves.

There are two important things to notice:

Internal nodes always have more than 1 outgoing edge. That is, internal nodes mark those parts of the tree where branching occurs.
Branching occurs wherever a repeated string is involved, and only there. For any internal node X, the string leading from the root to X must have occurred in the input string at least as many times as there are outgoing edges from X.

Example: The string leading to the blue node is ABAB. Indeed, this string appears twice in the input string: At position 0 and at position 2. And that is why the blue node exists.

Now about suffix links:

If the string s leading up to some internal node X is longer than 1 character, the same string minus the first character (call this s_-1) must be in the tree, too (it's a suffix tree, after all, so the suffix of any of its strings must be in the tree, too).

Example: Let s=ABAB, the string leading to the blue node. Then after removing the first character, s_-1 is BAB. And indeed that string is found in the tree, too. It leads to the green node.
If some string s leads to an internal node, its shortened version s_-1must lead to an internal node (call it X_-1) as well. Why? Because s must appear at least twice in the input string, so s_-1 must appear at least as many times (because it is part of s: wherever s appears, s_-1 must appear, too). But if s_-1 appears multiple times in the input string, then there must be an internal node for it.
In any such situation, a special link connecting X to X_-1 is a suffix link.

Note: Because of (1) and (2) above, every internal node X that has a label from root to X of more than 1 character must have a suffix link to exactly one other internal node.

Example:

This is the same suffix tree as before; the dotted lines indicate the suffix links. If you start at the blue node and follow the suffix links from there (from blue, to green, to first gray, to second gray), and look at the strings leading from the root to each node, you will see this:

 ABAB   ->    BAB    ->    AB    ->    B
(blue)      (green)     (gray1)     (gray2)

This is why they are called suffix links (the entire sequence is called suffix chain).

What are they good for?

They are good for surprisingly many things. However, they play a particular role in Ukkonen's algorithm for suffix tree construction, specifically in Rule 3 described there: After inserting a the final character of some suffix s at some point X, the algorithm needs to insert the final character of suffix s_-1 in O(1) time. In order to do that, it uses the suffix link to jump right to the place X_-1 and makes the insert.

But, note that there is no necessity to put suffix links in a suffix tree. They are not part of the definition of a suffix tree — they are just special links used by some algorithms that construct or use suffix trees.

Regarding single-character nodes: What if there is an internal node X whose string (i.e. the string on the path from root to X) consists of only one character? By the definition above, X then does not have a suffix link. You can however assume that if it had a suffix link, it would point to the root node. Furthermore: If, by the definition above, an internal node does not have a suffix link, it must be a single-character node, so you can always assume that if no suffix link is present at an internal node it must be a single-character node, and therefore, the node that represents the s_-1 suffix is the root node. (Some algorithms may actually add an explicit suffix link pointing to the root node in this case.) Thanks to j_random_hacker for the comment about this.

answered Nov 02 '22 23:11

jogojapan

Related questions
                            
                                How to generate a verification code/number?
                            
                                Implementing Text Justification with Dynamic Programming
                            
                                Peak finding algorithm
                            
                                Which algorithms are hard to implement in functional languages?
                            
                                Determining the big-O runtimes of these different loops?
                            
                                How does Amazon's Statistically Improbable Phrases work?
                            
                                How to find the number of different shortest paths between two vertices, in directed graph and with linear-time?
                            
                                Understanding this matrix transposition function in Haskell
                            
                                Random 2D Tile-Map Generating Algorithm
                            
                                In-order iterator for binary tree [closed]
                            
                                OpenMp C++ algorithms for min, max, median, average [closed]
                            
                                How to efficiently search in an ordered matrix? [duplicate]
                            
                                Broad-phase collision detection methods?
                            
                                What is the difference between std::sort and std::stable_sort?
                            
                                What's the difference between LibSVM and LibLinear
                            
                                How to find all grid squares on a line?
                            
                                Which row has the most 1s in a 0-1 matrix with all 1s "on the left"?
                            
                                Why is Binary Search a divide and conquer algorithm?
                            
                                Postfix notation to expression tree
                            
                                Sorting numbers from 1 to 999,999,999 in words as strings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How and when to create a suffix link in suffix tree?

Tags:

algorithm

suffix-tree

lingguang1997

People also ask

1 Answers

jogojapan

Recent Activity

Donate For Us