How to find longest common substring using trees?

Question

The longest common substring problem according to wiki can be solved using a suffix tree.
From wiki:

The longest common substrings of a set of strings can be found by building a generalised suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it

I don't get this.
Example: if I have:
ABCDE and XABCZ
then the suffix tree is (some branches from XABCZ omitted due to space):
enter image description here

The longest common substring is ABC but it is not I can not see how the description of wiki helps here.
ABC is not the deepest internal nodes with leaf nodes.
Any help to understand how this works?

Justin · Accepted Answer

Like what's been said before, your tree is incorrect.

This is what I get when running "ABCDE$XABCZ" through my code.

Suffix Tree code:

String = ABCDE$XABCZ$
End of word character 1 = $
└── (0)
    ├── (20) $
    ├── (22) ABC
    │   ├── (15) DE$
    │   └── (23) Z$
    ├── (24) BC
    │   ├── (16) DE$
    │   └── (25) Z$
    ├── (26) C
    │   ├── (17) DE$
    │   └── (27) Z$
    ├── (18) DE$
    ├── (19) E$
    ├── (21) XABCZ$
    └── (28) Z$

In a (compact) suffix tree, you need to find the deepest internal node(s) which have leaf nodes from all the strings. If you have multiple nodes at the same depth, you have to compare the length of the string represented by that node. i.e. ABC, BC, and C all have the same depth, so you have to compare the length of ABC, BC, and C strings to see which is longer; which is ABC obviously.

Suffix Trie code:

└── null
    ├── A
    │   └── B
    │       └── C
    │           ├── D
    │           │   └── (E) ABCDE
    │           └── (Z) ABCZ
    ├── B
    │   └── C
    │       ├── D
    │       │   └── (E) BCDE
    │       └── (Z) BCZ
    ├── C
    │   ├── D
    │   │   └── (E) CDE
    │   └── (Z) CZ
    ├── D
    │   └── (E) DE
    ├── (E) E
    ├── X
    │   └── A
    │       └── B
    │           └── C
    │               └── (Z) XABCZ
    └── (Z) Z

In a (not compact) suffix trie, find the deepest internal node(s) which have leaf nodes from all the strings.

Hope it helps.

How to find longest common substring using trees?

Tags:

java

substring

algorithm

data-structures

tree

Cratylus

1 Answers

Justin

Recent Activity

Donate For Us

How to find longest common substring using trees?

Tags:

java

substring

algorithm

data-structures

tree

Cratylus

1 Answers

Justin

Related questions

Recent Activity

Donate For Us