Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding repetitive substrings

Having some arbitrary string such as

hello hello hello I am I am I am your string string string string of strings

Can I somehow find repetitive sub-strings delimited by spaces(EDIT)? In this case it would be 'hello', 'I am' and 'string'.

I have been wondering about this for some time but I still can not find any real solution. I also have read some articles concerning this topic and hit up on suffix trees but can this help me even though I need to find every repetition e.g. with repetition count higher than two?

If it is so, is there some library for python, that can handle suffix trees and perform operations on them?

Edit: I am sorry I was not clear enough. So just to make it clear - I am looking for repetitive sub-strings, that means sequences in string, that, for example, in terms of regular expressions can be substituted by + or {} wildcards. So If I would have to make regular expression from listed string, I would do

(hello ){3}(I am ){3}your (string ){4}of strings 
like image 532
Jendas Avatar asked Aug 31 '13 18:08

Jendas


People also ask

How do you check for repeated substrings in a string in python?

Python has a built-in function for counting the repeated substring in a given string called count().

How do you find the longest repeating substring?

The maximum value of LCSRe(i, j) provides the length of the longest repeating substring and the substring itself can be found using the length and the ending index of the common suffix.


1 Answers

To find two or more characters that repeat two or more times, each delimited by spaces, use:

(.{2,}?)(?:\s+\1)+

Here's a working example with your test string: http://bit.ly/17cKX62

EDIT: made quantifier in capture group reluctant by adding ? to match shortest possible match (i.e. now matches "string" and not "string string")

EDIT 2: added a required space delimiter for cleaner results

like image 85
Ray Waldin Avatar answered Nov 03 '22 20:11

Ray Waldin