Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write patterns for use with re.VERBOSE when they contain meaningful whitespace?

Regexes containing meaningful spaces break when re.VERBOSE is added, apparently because re.VERBOSE 'helpfully' magics away the (meaningful) whitespace inside 'Issue Summary', as well as all the crappy non-meaningful whitespace (e.g. padding and newlines inside a (multiline) pattern). (My use of re.VERBOSE with multiline is non-negotiable - this is actually a massive simplification of a huge multiline regex where re.VERBOSE is necessary just to stay sane.)

import re
re.match(r'''Issue Summary.*''', 'Issue Summary: fails''', re.U|re.VERBOSE)
# No match!
re.match(r'''Issue Summary.*''', 'Issue Summary: passes''', re.U)
<_sre.SRE_Match object at 0x10ba36030>
re.match(r'Issue Summary.*', 'Issue Summary: passes''', re.U)
<_sre.SRE_Match object at 0x10b98ff38>

Is there a saner alternative to write re.VERBOSE-friendly patterns containing meaningful spaces, short of replacing each instance in my pattern with '\s' or '.', which is not just ugly but counter-intuitive and a pain to automate?

re.match(r'Issue\sSummary.*''', 'Issue Summary: fails', re.VERBOSE)
<_sre.SRE_Match object at 0x10ba36030>
re.match(r'Issue.Summary.*''', 'Issue Summary: fails', re.VERBOSE)
<_sre.SRE_Match object at 0x10b98ff38>

(As an aside, this a useful docbug catch on Python 2 and 3. I'll file it once I get consensus here on what the right solution is)

like image 842
smci Avatar asked Nov 17 '17 00:11

smci


People also ask

What does re verbose mean in Python?

re. VERBOSE : This flag allows you to write regular expressions that look nicer and are more readable by allowing you to visually separate logical sections of the pattern and add comments.

What does passing re verbose as the 2nd argument to re compile () allow to do?

VERBOSE as the second argument to re. compile() allow you to do? The re. VERBOSE argument allows you to add whitespace and comments to the string passed to re.

How do you use re in Python?

Python has a module named re to work with RegEx. Here's an example: import re pattern = '^a...s$' test_string = 'abyss' result = re. match(pattern, test_string) if result: print("Search successful.") else: print("Search unsuccessful.")


1 Answers

If re.VERBOSE is used, then I think there's no choice other than to change the regular expression string. However, I would suggest one of the following:

r'abc\ def'

or:

r'abc[ ]def'

Both r'\ ' and '[ ]' match a single space character (not any whitespace, only an actual space). Note that, without the r in front, the backslash character would need to be doubled, i.e. \\.

like image 178
Tom Karzes Avatar answered Sep 29 '22 18:09

Tom Karzes