too many indentation when checking a string with many regular expressions serially in python

Question

I get deep indentation when I write code like below

match = re.search(some_regex_1, s)
if match:
    # do something with match data
else:
    match = re.search(some_regex_2, s)
    if match:
        # do something with match data
    else:
        match = re.search(soem_regex_3, s)
        if match:
            # do something with match data
        else:
            # ...
            # and so on

I tried to rewrite as:

if match = re.search(some_regex_1, s):
    # ...
elif match = re.search(some_regex_2, s):
    # ...
elif ....
    # ...
...

but Python doesn't allow that syntax. What should I do to avoid deep indentation in this case?

Silas Ray · Accepted Answer

regexes = (regex1, regex2, regex3)
for regex in regexes:
    match = re.search(regex, s)
    if match:
        #do stuff
        break

Alternatively (more advanced):

def process1(match_obj):
    #handle match 1

def process2(match_obj):
    #handle match 2

def process3(match_obj):
    #handle match 3
.
.
.
handler_map = ((regex1, process1), (regex2, process2), (regex3, process3))
for regex, handler in handler_map:
    match = re.search(regex, s)
    if match:
        result = handler(match)
        break
else:
    #else condition if no regex matches

pillmuncher · Answer

If you can use finditer() instead of search() (most of the time you can), you could join all your regexes into one and use symbolic group names. Here is an example:

import re

regex = """
   (?P<number> \d+ ) |
   (?P<word> \w+ ) |
   (?P<punctuation> \. | \! | \? | \, | \; | \: ) |
   (?P<whitespace> \s+ ) |
   (?P<eof> $ ) |
   (?P<error> \S )
"""

scan = re.compile(pattern=regex, flags=re.VERBOSE).finditer

for match in scan('Hi, my name is Joe. I am 1 programmer.'):
    token_type = match.lastgroup
    if token_type == 'number':
        print 'found number "%s"' % match.group()
    elif token_type == 'word':
        print 'found word "%s"' % match.group()
    elif token_type == 'punctuation':
        print 'found punctuation character "%s"' % match.group()
    elif token_type == 'whitespace':
        print 'found whitespace'
    elif token_type == 'eof':
        print 'done parsing'
        break
    else:
        raise ValueError('String kaputt!')

too many indentation when checking a string with many regular expressions serially in python

Tags:

python

indentation

conditional-statements

Le Curious

2 Answers

Silas Ray

pillmuncher

Recent Activity

Donate For Us

too many indentation when checking a string with many regular expressions serially in python

Tags:

python

indentation

conditional-statements

Le Curious

2 Answers

Silas Ray

pillmuncher

Related questions

Recent Activity

Donate For Us