Consider the string
aabaabaabaabaab
Clearly this string is made of 5 adjacent occurences of aab
, so I want my regex to match aab
.
To elaborate: aabaab
wouldn't be an acceptable output because it's made by repeating aab
. But aab
is a valid result because it's not made of a repeated shorter string.
For the sake of the question, let us assume that there can be additional text around the repeated segment (for example 11aabaabaabaabaab22
or even xaabaabaabaabaabaa
). Therefore it's not possible to anchor the regex with ^
or $
.
Failed idea #1: (.+?)\1+
This captures aa
instead of the expected aab
.
Failed idea #2: (.+)\1+
This captures aabaab
.
Is it possible to do this with pure regex? If yes, is it possible without dynamic-width lookbehind?
The maximum value of LCSRe(i, j) provides the length of the longest repeating substring and the substring itself can be found using the length and the ending index of the common suffix.
Longest Repeating Substring in C++ Suppose we have a string S, we have to find the length of the longest repeating substring(s). We will return 0 if no repeating substring is present. So if the string is like “abbaba”, then the output will be 2. As the longest repeating substring is “ab” or “ba”.
You can use two lookaheads, the first one searches the longest pattern and the second searches the smallest. The second pattern repeated has to end (at least) at the same position or after the first pattern repeated. To check that, you have to capture the end of the string in the first lookahead and to use a backreference to this capture in the second lookahead.
(?=(.+)\1+(.*))(?=(.+?)\3+\2$)\3+
demo
The result is in the group 3
also:
(?=(.+)\1+(.*))(.+?)\3+(?=\2$)\3*
Note that these two regex patterns give the result for one position in a string. If you want to know what is the shortest pattern that is the longest substring once repeated for all the string, you have to find all of them and to select the longest susbstring with code. You can use a pattern for overlapping results to do that:
(?=(.+)\1+(.*))(?=(.+?)\3+\2$)(?=(\3+))
(using the group 4)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With