I have a long regex that I want to continue on to the next line, but everything I've tried gives me either an EOL or breaks the regex. I have already continued the line once within the parenthesis, and have read How can I do a line break (line continuation)? among other things.
Working, but still too long:
REGEX = re.compile(
r'\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)')
Wrong:
REGEX = re.compile(
r'\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\)[A-Z0-9]+
)\s+([a-zA-Z\d-]+)')
SyntaxError: EOL while scanning string literal
REGEX = re.compile(
r'\d\s+\d+\s+([A-Z0-9-]+)\s+([0-9]+.\d\(\d\
)[A-Z0-9]+)\s+([a-zA-Z\d-]+)')
sre_constants.error: unbalanced parenthesis
REGEX = re.compile(
r'\d\s+\d+\s+([A-Z0-9-]+)\s+( \
[0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)')
regex no longer works
REGEX = (re.compile(
r'\d\s+\d+\s+([A-Z0-9-]+)\s+(
[0-9]+.\d\(\d\)[A-Z0-9]+)\s+([a-zA-Z\d-]+)'))
SyntaxError: EOL while scanning string literal
I have been able to shorten my regex so that this is no longer an issue, but I'm now interested to know how I might do line continuation with a long regex?
To split a string by a regular expression, pass a regex as a parameter to the split() method, e.g. str. split(/[,. \s]/) . The split method takes a string or regular expression and splits the string based on the provided separator, into an array of substrings.
By default in most regex engines, . doesn't match newline characters, so the matching stops at the end of each logical line. If you want . to match really everything, including newlines, you need to enable “dot-matches-all” mode in your regex engine of choice (for example, add re. DOTALL flag in Python, or /s in PCRE.
Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.
If you use the re.VERBOSE
flag, you can split your regular expression up as much as you like to make it more readable:
pattern = r"""
\d\s+
\d+\s+
([A-Z0-9-]+)\s+
([0-9]+.\d\(\d\)[A-Z0-9]+)\s+
([a-zA-Z\d-]+)"""
REGEX = re.compile(pattern, re.VERBOSE)
This approach is explained in the excellent "Dive Into Python" book.
See "Verbose Regular Expressions".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With