This was my original approach:
string = '1'*15
result = re.finditer(r'(?=11111)', string) # overlapped = True
# Doesn't work for me
for i in result: # python 3.5
print(i.start(), i.end())
It finds all overlapping matches, but fails to get the right end index. The output:
1 <_sre.SRE_Match object; span=(0, 0), match=''>
2 <_sre.SRE_Match object; span=(1, 1), match=''>
3 <_sre.SRE_Match object; span=(2, 2), match=''>
4 <_sre.SRE_Match object; span=(3, 3), match=''>
(and so on..)
My Question: How can I find all overlapping matches, and get all the start and end index right as well?
The problem you get is related to the fact that a lookahead is a zero-width assertion that consumes (i.e. adds to the match result) no text. It is a mere position in the string. Thus, all your matches start and end at the same location in the string.
You need to enclose the lookahead pattern with a capturing group (i.e. (?=(11111))
) and access start and end of group 1 (with i.start(1)
and i.end(1)
):
import re
s = '1'*15
result = re.finditer(r'(?=(11111))', s)
for i in result:
print(i.start(1), i.end(1))
See the Python demo, its output is
(0, 5)
(1, 6)
(2, 7)
(3, 8)
(4, 9)
(5, 10)
(6, 11)
(7, 12)
(8, 13)
(9, 14)
(10, 15)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With