Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regular expression matches in Python

Tags:

python

regex

I have a question regarding regular expressions. When using or construct

$ python
Python 2.7.3 (default, Sep 26 2012, 21:51:14) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> for mo in re.finditer('a|ab', 'ab'):
...     print mo.start(0), mo.end(0)
... 
0 1

we get only one match, which is expected as the first leftmost branch, that gets accepted is reported. My question is that is it possible and how to construct a regular expression, which would yield both (0,1) and (0,2). And also, how to do that in general for any regex in form r1 | r2 | ... | rn .

Similarly, is it possible to achieve this for *, +, and ? constructs? As by default:

>>> for mo in re.finditer('a*', 'aaa'):
...     print mo.start(0), mo.end(0)
... 
0 3
3 3
>>> for mo in re.finditer('a+', 'aaa'):
...     print mo.start(0), mo.end(0)
... 
0 3
>>> for mo in re.finditer('a?', 'aaa'):
...     print mo.start(0), mo.end(0)
... 
0 1
1 2
2 3
3 3

Second question is that why do empty strings match at ends, but not anywhere else as is case with * and ? ?

EDIT:

I think I realize now that both questions were nonsense: as @mgilson said, re.finditer only returns non-overlapping matches and I guess whenever a regular expression accepts a (part of a) string, it terminates the search. Thus, it is impossible with default settings of the Python matching engine.

Although I wonder that if Python uses backtracking in regex matching, it should not be very difficult to make it continue searching after accepting strings. But this would break the usual behavior of regular expressions.

EDIT2:

This is possible in Perl. See answer by @Qtax below.

like image 363
Timo Avatar asked Nov 03 '22 05:11

Timo


1 Answers

I don't think this is possible. The docs for re.finditer state:

Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string

(emphasis is mine)


In answer to your other question about why empty strings don't match elsewhere, I think it is because the rest of the string is already matched someplace else and finditer only gives matches for non-overlapping patterns which match (see answer to first part ;-).

like image 149
mgilson Avatar answered Nov 15 '22 05:11

mgilson