Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use re to find consecutive, repeated chars

Tags:

python

regex

I want to find all consecutive, repeated character blocks in a string. For example, consider the following:

s = r'http://www.google.com/search=ooo-jjj'

What I want to find this: www, ooo and jjj.

I tried to do it like this:

m = re.search(r'(\w)\1\1', s)

But it doesn't seem to work as I expect. Any ideas?

Also, how can I do it in Bash?

like image 605
Alcott Avatar asked Aug 22 '11 12:08

Alcott


People also ask

How do you find consecutive repeated characters in a string in SQL?

(assuming your string is located in the column named my_string of a single-row table named my_table ). In Oracle, you'll need the scalar function INSTR(my_string, 'AA') . In SQL Server, you'll need CHARINDEX('AA', my_string) .

How do you find consecutive repeated characters in a string in python?

Given a String, extract all the K-length consecutive characters. Input : test_str = 'geekforgeeeksss is bbbest forrr geeks', K = 3 Output : ['eee', 'sss', 'bbb', 'rrr'] Explanation : K length consecutive strings extracted.

How do you find consecutive repeated characters in a string in PHP?

The str_repeat() function repeats a string a specified number of times.


2 Answers

((\w)\2{2,}) matches 3 or more consecutive characters:

In [71]: import re
In [72]: s = r'http://www.google.com/search=ooo-jjjj'
In [73]: re.findall(r'((\w)\2{2,})', s)
Out[73]: [('www', 'w'), ('ooo', 'o'), ('jjjj', 'j')]

In [78]: [match[0] for match in re.findall(r'((\w)\2{2,})', s)]
Out[78]: ['www', 'ooo', 'jjjj']

(\w) matches any alphanumeric character.

((\w)\2) matches any alphanumeric character followed by the same character, since \2 matches the contents of group number 2. Since I nested the parentheses, group number 2 refers to the character matched by \w.

Then putting it all together, ((\w)\2{2,}) matches any alphanumeric character, followed by the same character repeated 2 or more additional times.

In total, that means the regex require the character to be repeated 3 or more times.

like image 125
unutbu Avatar answered Nov 03 '22 10:11

unutbu


The following code should solve your problem:

s="abc def aaa bbb ccc def hhh"

for match in re.finditer(r"(\w)\1\1", s):
    print s[match.start():match.end()]
like image 21
rocksportrocker Avatar answered Nov 03 '22 11:11

rocksportrocker