<pre class="prettyprint"><code>>>> match = re.findall(r'\w\w', 'hello') >>> print match ['he', 'll'] </code></pre> Since \w\w means two characters, 'he' and 'll' are expected. But why do 'el' and 'lo' not match the regex? <pre class="prettyprint"><code>>>> match1 = re.findall(r'el', 'hello') >>> print match1 ['el'] >>> </code></pre>

<code>findall</code> doesn't yield overlapping matches by default. This expression does however: <pre class="prettyprint"><code>>>> re.findall(r'(?=(\w\w))', 'hello') ['he', 'el', 'll', 'lo'] </code></pre> Here <code>(?=...)</code> is a lookahead assertion: <blockquote> <code>(?=...)</code> matches if <code>...</code> matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, <code>Isaac (?=Asimov)</code> will match <code>'Isaac '</code> only if it’s followed by <code>'Asimov'</code>. </blockquote>

You can use the new Python regex module, which supports overlapping matches. <pre class="prettyprint"><code>>>> import regex as re >>> match = re.findall(r'\w\w', 'hello', overlapped=True) >>> print match ['he', 'el', 'll', 'lo'] </code></pre>

How to find overlapping matches with a regexp?

Tags:

python

regex

overlapping

>>> match = re.findall(r'\w\w', 'hello') >>> print match ['he', 'll']

Since \w\w means two characters, 'he' and 'll' are expected. But why do 'el' and 'lo' not match the regex?

>>> match1 = re.findall(r'el', 'hello') >>> print match1 ['el'] >>>

808

asked Jul 11 '12 10:07

futurenext110

2 Answers

findall doesn't yield overlapping matches by default. This expression does however:

>>> re.findall(r'(?=(\w\w))', 'hello') ['he', 'el', 'll', 'lo']

Here (?=...) is a lookahead assertion:

(?=...) matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

answered Sep 28 '22 22:09

Otto Allmendinger

You can use the new Python regex module, which supports overlapping matches.

>>> import regex as re >>> match = re.findall(r'\w\w', 'hello', overlapped=True) >>> print match ['he', 'el', 'll', 'lo']

answered Sep 28 '22 22:09

David C

Related questions
                            
                                What exactly is __weakref__ in Python?
                            
                                Using Keras & Tensorflow with AMD GPU
                            
                                How to remove timezone from a Timestamp column in a pandas dataframe
                            
                                Return list of items in list greater than some value
                            
                                Matplotlib color according to class labels
                            
                                Tkinter understanding mainloop
                            
                                Pythonic way to convert a dictionary into namedtuple or another hashable dict-like?
                            
                                How can I unit test django messages?
                            
                                add vs update in set operations in python
                            
                                How can I run a celery periodic task from the shell manually?
                            
                                RuntimeError: module compiled against API version a but this version of numpy is 9
                            
                                Python unexpected EOF while parsing
                            
                                Compile main Python program using Cython
                            
                                A get() like method for checking for Python attributes
                            
                                Python: return the index of the first element of a list which makes a passed function true
                            
                                Which Model Field to use in Django to store longitude and latitude values?
                            
                                Calling Python in PHP
                            
                                Is it possible to store the alembic connect string outside of alembic.ini?
                            
                                ModuleNotFoundError: No module named 'sklearn'
                            
                                How do I set up Setuptools for Python 2.6 on Windows?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With