I have a question regarding regular expressions. When using or
construct
$ python
Python 2.7.3 (default, Sep 26 2012, 21:51:14)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> for mo in re.finditer('a|ab', 'ab'):
... print mo.start(0), mo.end(0)
...
0 1
we get only one match, which is expected as the first leftmost branch, that gets accepted is reported. My question is that is it possible and how to construct a regular expression, which would yield both (0,1) and (0,2). And also, how to do that in general for any regex in form r1 | r2 | ... | rn
.
Similarly, is it possible to achieve this for *
, +
, and ?
constructs? As by default:
>>> for mo in re.finditer('a*', 'aaa'):
... print mo.start(0), mo.end(0)
...
0 3
3 3
>>> for mo in re.finditer('a+', 'aaa'):
... print mo.start(0), mo.end(0)
...
0 3
>>> for mo in re.finditer('a?', 'aaa'):
... print mo.start(0), mo.end(0)
...
0 1
1 2
2 3
3 3
Second question is that why do empty strings match at ends, but not anywhere else as is case with *
and ?
?
EDIT:
I think I realize now that both questions were nonsense: as @mgilson said, re.finditer only returns non-overlapping matches and I guess whenever a regular expression accepts a (part of a) string, it terminates the search. Thus, it is impossible with default settings of the Python matching engine.
Although I wonder that if Python uses backtracking in regex matching, it should not be very difficult to make it continue searching after accepting strings. But this would break the usual behavior of regular expressions.
EDIT2:
This is possible in Perl. See answer by @Qtax below.
I don't think this is possible. The docs for re.finditer
state:
Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string
(emphasis is mine)
In answer to your other question about why empty strings don't match elsewhere, I think it is because the rest of the string is already matched someplace else and finditer
only gives matches for non-overlapping patterns which match (see answer to first part ;-).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With