Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find overlapping matches with a regexp?

>>> match = re.findall(r'\w\w', 'hello') >>> print match ['he', 'll'] 

Since \w\w means two characters, 'he' and 'll' are expected. But why do 'el' and 'lo' not match the regex?

>>> match1 = re.findall(r'el', 'hello') >>> print match1 ['el'] >>> 
like image 808
futurenext110 Avatar asked Jul 11 '12 10:07

futurenext110


People also ask

Can regex matches overlap?

You can use the new Python regex module, which supports overlapping matches.

How do I find all matches in regex?

The method str. match(regexp) finds matches for regexp in the string str . If the regexp has flag g , then it returns an array of all matches as strings, without capturing groups and other details. If there are no matches, no matter if there's flag g or not, null is returned.

What are non overlapping matches?

This will return an array of all non-overlapping regex matches in the string. “Non-overlapping” means that the string is searched through from left to right, and the next match attempt starts beyond the previous match.

How does regex matching work?

A regex pattern matches a target string. The pattern is composed of a sequence of atoms. An atom is a single point within the regex pattern which it tries to match to the target string. The simplest atom is a literal, but grouping parts of the pattern to match an atom will require using ( ) as metacharacters.


2 Answers

findall doesn't yield overlapping matches by default. This expression does however:

>>> re.findall(r'(?=(\w\w))', 'hello') ['he', 'el', 'll', 'lo'] 

Here (?=...) is a lookahead assertion:

(?=...) matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

like image 97
Otto Allmendinger Avatar answered Sep 28 '22 22:09

Otto Allmendinger


You can use the new Python regex module, which supports overlapping matches.

>>> import regex as re >>> match = re.findall(r'\w\w', 'hello', overlapped=True) >>> print match ['he', 'el', 'll', 'lo'] 
like image 33
David C Avatar answered Sep 28 '22 22:09

David C