Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect repetitions in string

Tags:

python

regex

I have a simple problem, but can't come with a simple solution :)

Let's say I have a string. I want to detect if there is a repetition in it.

I'd like:

"blablabla" # => (bla, 3)

"rablabla"  # => (bla, 2)

The thing is I don't know what pattern I am searching for (I don't have "bla" as input).

Any idea?

EDIT:
Seeing the comments, I think I should precise a bit more what I have in mind:

  • In a string, there is either a pattern that is repeted or not.
  • The repeted pattern can be of any length.

If there is a pattern, it would be repeted over and over again until the end. But the string can end in the middle of the pattern.

Example:

"testblblblblb" # => ("bl",4) 
like image 663
jlengrand Avatar asked Jan 31 '12 12:01

jlengrand


1 Answers

import re
def repetitions(s):
   r = re.compile(r"(.+?)\1+")
   for match in r.finditer(s):
       yield (match.group(1), len(match.group(0))/len(match.group(1)))

finds all non-overlapping repeating matches, using the shortest possible unit of repetition:

>>> list(repetitions("blablabla"))
[('bla', 3)]
>>> list(repetitions("rablabla"))
[('abl', 2)]
>>> list(repetitions("aaaaa"))
[('a', 5)]
>>> list(repetitions("aaaaablablabla"))
[('a', 5), ('bla', 3)]
like image 159
Tim Pietzcker Avatar answered Sep 21 '22 09:09

Tim Pietzcker