Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Find sequence of same characters

Tags:

python

regex

I'm trying to use regex to match sequences of one or more instances of the same characters in a string.

Example :

string = "55544355"
# The regex should retrieve sequences "555", "44", "3", "55"

Can I have a few tips?

like image 671
Eduardo Almeida Avatar asked Dec 05 '22 00:12

Eduardo Almeida


1 Answers

You can use re.findall() and the ((.)\2*) regular expression:

>>> [item[0] for item in re.findall(r"((.)\2*)", string)]
['555', '44', '3', '55']

the key part is inside the outer capturing group - (.)\2*. Here we capture a single character via (.) then reference this character by the group number: \2. The group number is 2 because we have an outer capturing group with number 1. * means 0 or more times.

You could've also solved it with a single capturing group and re.finditer():

>>> [item.group(0) for item in re.finditer(r"(.)\1*", string)]
['555', '44', '3', '55']
like image 157
alecxe Avatar answered Dec 06 '22 13:12

alecxe