python 3 regex - find all overlapping matches' start and end index in a string

Question

This was my original approach:

string = '1'*15     
result = re.finditer(r'(?=11111)', string)      # overlapped = True   
                                                # Doesn't work for me 
for i in result:                                # python 3.5
   print(i.start(), i.end())

It finds all overlapping matches, but fails to get the right end index. The output:

1 <_sre.SRE_Match object; span=(0, 0), match=''>
2 <_sre.SRE_Match object; span=(1, 1), match=''>
3 <_sre.SRE_Match object; span=(2, 2), match=''>
4 <_sre.SRE_Match object; span=(3, 3), match=''>
(and so on..)

My Question: How can I find all overlapping matches, and get all the start and end index right as well?

Wiktor Stribiżew · Accepted Answer

The problem you get is related to the fact that a lookahead is a zero-width assertion that consumes (i.e. adds to the match result) no text. It is a mere position in the string. Thus, all your matches start and end at the same location in the string.

You need to enclose the lookahead pattern with a capturing group (i.e. (?=(11111))) and access start and end of group 1 (with i.start(1) and i.end(1)):

import re
s = '1'*15     
result = re.finditer(r'(?=(11111))', s)

for i in result:
    print(i.start(1), i.end(1))

See the Python demo, its output is

(0, 5)
(1, 6)
(2, 7)
(3, 8)
(4, 9)
(5, 10)
(6, 11)
(7, 12)
(8, 13)
(9, 14)
(10, 15)

python 3 regex - find all overlapping matches' start and end index in a string

Tags:

python

regex

Bjango

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us

python 3 regex - find all overlapping matches' start and end index in a string

Tags:

python

regex

Bjango

1 Answers

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us