Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A regex to detect periodic strings

Tags:

regex

A period p of a string w is any positive integer p such that w[i]=w[i+p] whenever both sides of this equation are defined. Let per(w) denote the size of the smallest period of w . We say that a string w is periodic iff per(w) <= |w|/2.

So informally a periodic string is just a string that is made up from a prefix repeated at least twice. The only complication is that at the end of the string we don't require a full copy of the prefix.

For, example consider the string x = abcab. per(abcab) = 3 as x[1] = x[1+3] = a, x[2]=x[2+3] = b and there is no smaller period. The string abcab is therefore not periodic. However, the string ababa is periodic as per(ababa) = 2.

As more examples, abcabca, ababababa and abcabcabc are also periodic.

Is there a regex to determine if a string is periodic or not?

I don't really mind which flavor of regex but if it makes a difference, anything that Python re supports.

like image 261
graffe Avatar asked Jul 12 '16 07:07

graffe


People also ask

How to detect a period in regex?

Use the escape character \ to match a period with regexwithin a regular expression to match a literal period since, by default, the dot . is a metacharacter in regex that matches any character except a newline.

What is?= in regex?

(?= regex_here) is a positive lookahead. It is a zero-width assertion, meaning that it matches a location that is followed by the regex contained within (?=

Does regex work only with strings?

So, yes, regular expressions really only apply to strings. If you want a more complicated FSM, then it's possible to write one, but not using your local regex engine. Save this answer.


1 Answers

What you need is backreference

\b(\w*)(\w+\1)\2+\b

This matches even abcabca and ababababa.

\1 and \2 are used to match the first and second capturing groups, respectively.

like image 167
horcrux Avatar answered Nov 15 '22 13:11

horcrux