A period
p
of a stringw
is any positive integerp
such thatw[i]=w[i+p]
whenever both sides of this equation are defined. Letper(w)
denote the size of the smallest period ofw
. We say that a stringw
is periodic iffper(w) <= |w|/2
.
So informally a periodic string is just a string that is made up from a prefix repeated at least twice. The only complication is that at the end of the string we don't require a full copy of the prefix.
For, example consider the string x = abcab
. per(abcab) = 3
as x[1] = x[1+3] = a
, x[2]=x[2+3] = b
and there is no smaller period. The string abcab
is therefore not periodic. However, the string ababa
is periodic as per(ababa) = 2
.
As more examples, abcabca
, ababababa
and abcabcabc
are also periodic.
Is there a regex to determine if a string is periodic or not?
I don't really mind which flavor of regex but if it makes a difference, anything that Python re
supports.
Use the escape character \ to match a period with regexwithin a regular expression to match a literal period since, by default, the dot . is a metacharacter in regex that matches any character except a newline.
(?= regex_here) is a positive lookahead. It is a zero-width assertion, meaning that it matches a location that is followed by the regex contained within (?=
So, yes, regular expressions really only apply to strings. If you want a more complicated FSM, then it's possible to write one, but not using your local regex engine. Save this answer.
What you need is backreference
\b(\w*)(\w+\1)\2+\b
This matches even abcabca
and ababababa
.
\1
and \2
are used to match the first and second capturing groups, respectively.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With