Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Python, how can I loop over all the matches of a regular expression on a string?

I want to do something (more than just substitution) with substrings matching a pattern in a longer string. If an assignment were an expression returning a value, as in C and most other programming languages, this would be (using C syntax with Python semantics):

  while ( match = re.search( pat, str ) ) {
       /* do something to the string, using the match object,
          in addition to removing or replacing the substring
       */
    }

or more verbosely, avoiding the use of an assignment as an expression:

for ( match = re.search( pat, str );
      match;
      match = re.search( pat, str ) ) {
   /* do something to the string, using the match object */
}

At least one of these is possible in most programming languages: C, C++, Java, Perl, Javascript, ... but neither of them seems to be possible in Python. Is there a pythonic equivalent (not involving a kludgey mess with a break or continue statement)?

like image 408
zizzler Avatar asked Jul 22 '17 13:07

zizzler


People also ask

How can I find all matches to a regular expression in Python?

findall(pattern, string) returns a list of matching strings. re. finditer(pattern, string) returns an iterator over MatchObject objects.

How do you repeat a regular expression in Python?

Practical Data Science using Python , '*' or '+' are called repeating character classes. If you repeat a character class by using the '?' , '*' or '+' operators, you will repeat the entire character class, and not just the character that it matched. The regex '[0-9]+' can match '579' as well as '333'.

How do you replace all occurrences of a regex pattern in a string in Python?

sub() method will replace all pattern occurrences in the target string. By setting the count=1 inside a re. sub() we can replace only the first occurrence of a pattern in the target string with another string. Set the count value to the number of replacements you want to perform.

What is the regular expression function to match all occurrences of a string in Python?

findall() is probably the single most powerful function in the re module. Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.


1 Answers

You may be looking for finditer:

Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

#!/usr/bin/env python3

import re

s = "abcabcabc";
it = re.finditer("(\w)", s)
for m in it:
    print(m.groups())
 $ ./t.py
('a',)
('b',)
('c',)
('a',)
('b',)
('c',)
('a',)
('b',)
('c',)
like image 197
Sinan Ünür Avatar answered Nov 15 '22 16:11

Sinan Ünür