The longest common substring problem according to wiki can be solved using a suffix tree.
From wiki:
The longest common substrings of a set of strings can be found by building a generalised suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it
I don't get this.
Example: if I have:ABCDE
and XABCZ
then the suffix tree is (some branches from XABCZ
omitted due to space):
The longest common substring is ABC
but it is not I can not see how the description of wiki helps here.ABC
is not the deepest internal nodes with leaf nodes.
Any help to understand how this works?
Like what's been said before, your tree is incorrect.
This is what I get when running "ABCDE$XABCZ" through my code.
Suffix Tree code:
String = ABCDE$XABCZ$
End of word character 1 = $
└── (0)
├── (20) $
├── (22) ABC
│ ├── (15) DE$
│ └── (23) Z$
├── (24) BC
│ ├── (16) DE$
│ └── (25) Z$
├── (26) C
│ ├── (17) DE$
│ └── (27) Z$
├── (18) DE$
├── (19) E$
├── (21) XABCZ$
└── (28) Z$
In a (compact) suffix tree, you need to find the deepest internal node(s) which have leaf nodes from all the strings. If you have multiple nodes at the same depth, you have to compare the length of the string represented by that node. i.e. ABC, BC, and C all have the same depth, so you have to compare the length of ABC, BC, and C strings to see which is longer; which is ABC obviously.
Suffix Trie code:
└── null
├── A
│ └── B
│ └── C
│ ├── D
│ │ └── (E) ABCDE
│ └── (Z) ABCZ
├── B
│ └── C
│ ├── D
│ │ └── (E) BCDE
│ └── (Z) BCZ
├── C
│ ├── D
│ │ └── (E) CDE
│ └── (Z) CZ
├── D
│ └── (E) DE
├── (E) E
├── X
│ └── A
│ └── B
│ └── C
│ └── (Z) XABCZ
└── (Z) Z
In a (not compact) suffix trie, find the deepest internal node(s) which have leaf nodes from all the strings.
Hope it helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With