I get deep indentation when I write code like below
match = re.search(some_regex_1, s)
if match:
# do something with match data
else:
match = re.search(some_regex_2, s)
if match:
# do something with match data
else:
match = re.search(soem_regex_3, s)
if match:
# do something with match data
else:
# ...
# and so on
I tried to rewrite as:
if match = re.search(some_regex_1, s):
# ...
elif match = re.search(some_regex_2, s):
# ...
elif ....
# ...
...
but Python doesn't allow that syntax. What should I do to avoid deep indentation in this case?
regexes = (regex1, regex2, regex3)
for regex in regexes:
match = re.search(regex, s)
if match:
#do stuff
break
Alternatively (more advanced):
def process1(match_obj):
#handle match 1
def process2(match_obj):
#handle match 2
def process3(match_obj):
#handle match 3
.
.
.
handler_map = ((regex1, process1), (regex2, process2), (regex3, process3))
for regex, handler in handler_map:
match = re.search(regex, s)
if match:
result = handler(match)
break
else:
#else condition if no regex matches
If you can use finditer() instead of search() (most of the time you can), you could join all your regexes into one and use symbolic group names. Here is an example:
import re
regex = """
(?P<number> \d+ ) |
(?P<word> \w+ ) |
(?P<punctuation> \. | \! | \? | \, | \; | \: ) |
(?P<whitespace> \s+ ) |
(?P<eof> $ ) |
(?P<error> \S )
"""
scan = re.compile(pattern=regex, flags=re.VERBOSE).finditer
for match in scan('Hi, my name is Joe. I am 1 programmer.'):
token_type = match.lastgroup
if token_type == 'number':
print 'found number "%s"' % match.group()
elif token_type == 'word':
print 'found word "%s"' % match.group()
elif token_type == 'punctuation':
print 'found punctuation character "%s"' % match.group()
elif token_type == 'whitespace':
print 'found whitespace'
elif token_type == 'eof':
print 'done parsing'
break
else:
raise ValueError('String kaputt!')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With