Is there any efficient way to find the duplicate substring? Here, duplicate means that two same substring close to each other have the same value without overlap. For example, the source string is:
ABCDDEFGHFGH
'D' and 'FGH' is duplicated. 'F' appear two times in the sequence, however, they are not close to each other, so it does not duplicate. so our algorithm will return ['D', 'FGH']. I want to know whether there exists an elegant algorithm instead the brute force method?
toCharArray(); int count=0; for (int i =0; i < line. length; i++) { int t =0; for (int j = 0; j < temp. length; j++) { if(line[i]==temp[j]){ t=1; } else{ t=0; } } if(t==1){ count=count+1; } } System.
The maximum value of LCSRe(i, j) provides the length of the longest repeating substring and the substring itself can be found using the length and the ending index of the common suffix.
It relates to Longest repeated substring problem, which builds Suffix Tree to provide string searching in linear time and space complexity Θ(n)
Not very efficient (suffix tree/array are better for very large strings), but very short regular expression solution (C#):
string source = @"ABCDDEFGHFGH";
string[] result = Regex
.Matches(source, @"(.+)\1")
.OfType<Match>()
.Select(match => match.Groups[1].Value)
.ToArray();
Explanation
(.+) - group of any (at least 1) characters
\1 - the same group (group #1) repeated
Test
Console.Write(string.Join(", ", result));
Outcome
D, FGH
In case of ambiguity, e.g. "AAAA"
where we can provide "AA"
as well as "A"
the solution performs greedy and thus "AA"
is returned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With