How to find the longest repeated (non-overlapping) subsequence (not substring)?
constraint:
The string S consist of at most 100.000 lower case character 'a'-'z'.
example:
String hanadswomehanudsiome has longest repeated (non-overlapping) subsequence handsome.
The expected time complexity is O(|S| log |S|) or better (|S| is length of string S).
The problem, defined in the language of computer science, is finding the longest common scattered sub-sequence (LCSS) that occurs at least twice without overlap in the input string.
This is a matter of ongoing research in computer science, and it is a sub-problem of the more general longest common subsequence problem. In one form or another, solutions to this problem have been proposed going from 1977 (Hunt-Szymanski algorithm) to the present day.
One of the more recent publications on the problem is "Small Longest Tandem Scattered Subsequences" by Russo and Francisco. It claims this time complexity for the algorithm:
O(min{n, ℓ}λ(1 + log(min{λ, ℓ/λ})) + nλ + ℓ)
Where:
n
is the size of the input stringλ
is the size of the longest common scattered sub-sequenceℓ
is the number of pairs of positions in the input string that contain the same letterHowever, for a constrained input size, the time complexity collapses to O(1)
as was pointed out in the comments.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With