Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Longest Repeated (non-overlapping) Subsequence

How to find the longest repeated (non-overlapping) subsequence (not substring)?

constraint:

The string S consist of at most 100.000 lower case character 'a'-'z'.

example:

String hanadswomehanudsiome has longest repeated (non-overlapping) subsequence handsome.

The expected time complexity is O(|S| log |S|) or better (|S| is length of string S).

like image 605
Dewangga Winardi Avatar asked Feb 22 '16 03:02

Dewangga Winardi


1 Answers

The problem, defined in the language of computer science, is finding the longest common scattered sub-sequence (LCSS) that occurs at least twice without overlap in the input string.

This is a matter of ongoing research in computer science, and it is a sub-problem of the more general longest common subsequence problem. In one form or another, solutions to this problem have been proposed going from 1977 (Hunt-Szymanski algorithm) to the present day.

One of the more recent publications on the problem is "Small Longest Tandem Scattered Subsequences" by Russo and Francisco. It claims this time complexity for the algorithm:

O(min{n, ℓ}λ(1 + log(min{λ, ℓ/λ})) + nλ + ℓ)

Where:

  • n is the size of the input string
  • λ is the size of the longest common scattered sub-sequence
  • is the number of pairs of positions in the input string that contain the same letter

However, for a constrained input size, the time complexity collapses to O(1) as was pointed out in the comments.

like image 81
mnistic Avatar answered Oct 25 '22 12:10

mnistic