Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Duplicate substring searching

Is there any efficient way to find the duplicate substring? Here, duplicate means that two same substring close to each other have the same value without overlap. For example, the source string is:

ABCDDEFGHFGH

'D' and 'FGH' is duplicated. 'F' appear two times in the sequence, however, they are not close to each other, so it does not duplicate. so our algorithm will return ['D', 'FGH']. I want to know whether there exists an elegant algorithm instead the brute force method?

like image 982
maple Avatar asked Dec 22 '16 11:12

maple


People also ask

How do you find duplicate substrings in a string in Java?

toCharArray(); int count=0; for (int i =0; i < line. length; i++) { int t =0; for (int j = 0; j < temp. length; j++) { if(line[i]==temp[j]){ t=1; } else{ t=0; } } if(t==1){ count=count+1; } } System.

How do you find the longest repeated substring?

The maximum value of LCSRe(i, j) provides the length of the longest repeating substring and the substring itself can be found using the length and the ending index of the common suffix.


2 Answers

It relates to Longest repeated substring problem, which builds Suffix Tree to provide string searching in linear time and space complexity Θ(n)

like image 197
Muhammad Faizan Uddin Avatar answered Sep 20 '22 13:09

Muhammad Faizan Uddin


Not very efficient (suffix tree/array are better for very large strings), but very short regular expression solution (C#):

  string source = @"ABCDDEFGHFGH";

  string[] result = Regex
    .Matches(source, @"(.+)\1")
    .OfType<Match>()
    .Select(match => match.Groups[1].Value)
    .ToArray(); 

Explanation

(.+) - group of any (at least 1) characters
\1   - the same group (group #1) repeated 

Test

  Console.Write(string.Join(", ", result));     

Outcome

  D, FGH

In case of ambiguity, e.g. "AAAA" where we can provide "AA" as well as "A" the solution performs greedy and thus "AA" is returned.

like image 33
Dmitry Bychenko Avatar answered Sep 18 '22 13:09

Dmitry Bychenko