Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

too many indentation when checking a string with many regular expressions serially in python

I get deep indentation when I write code like below

match = re.search(some_regex_1, s)
if match:
    # do something with match data
else:
    match = re.search(some_regex_2, s)
    if match:
        # do something with match data
    else:
        match = re.search(soem_regex_3, s)
        if match:
            # do something with match data
        else:
            # ...
            # and so on

I tried to rewrite as:

if match = re.search(some_regex_1, s):
    # ...
elif match = re.search(some_regex_2, s):
    # ...
elif ....
    # ...
...

but Python doesn't allow that syntax. What should I do to avoid deep indentation in this case?

like image 422
Le Curious Avatar asked May 19 '26 11:05

Le Curious


2 Answers

regexes = (regex1, regex2, regex3)
for regex in regexes:
    match = re.search(regex, s)
    if match:
        #do stuff
        break

Alternatively (more advanced):

def process1(match_obj):
    #handle match 1

def process2(match_obj):
    #handle match 2

def process3(match_obj):
    #handle match 3
.
.
.
handler_map = ((regex1, process1), (regex2, process2), (regex3, process3))
for regex, handler in handler_map:
    match = re.search(regex, s)
    if match:
        result = handler(match)
        break
else:
    #else condition if no regex matches
like image 103
Silas Ray Avatar answered May 22 '26 00:05

Silas Ray


If you can use finditer() instead of search() (most of the time you can), you could join all your regexes into one and use symbolic group names. Here is an example:

import re

regex = """
   (?P<number> \d+ ) |
   (?P<word> \w+ ) |
   (?P<punctuation> \. | \! | \? | \, | \; | \: ) |
   (?P<whitespace> \s+ ) |
   (?P<eof> $ ) |
   (?P<error> \S )
"""

scan = re.compile(pattern=regex, flags=re.VERBOSE).finditer

for match in scan('Hi, my name is Joe. I am 1 programmer.'):
    token_type = match.lastgroup
    if token_type == 'number':
        print 'found number "%s"' % match.group()
    elif token_type == 'word':
        print 'found word "%s"' % match.group()
    elif token_type == 'punctuation':
        print 'found punctuation character "%s"' % match.group()
    elif token_type == 'whitespace':
        print 'found whitespace'
    elif token_type == 'eof':
        print 'done parsing'
        break
    else:
        raise ValueError('String kaputt!')
like image 20
pillmuncher Avatar answered May 22 '26 00:05

pillmuncher