Python provides a flag (re.X
or re.VERBOSE
) to allow annotation of regular expressions:
a = re.compile(r"""\d + # the integral part
\. # the decimal point
\d * # some fractional digits""", re.X)
However, with automatic string concatenation, you could achieve basically the same thing:
a = re.compile(r'\d+' # integral part
r'\.' # decimal point
r'\d*' # optional fractional digits
)
I don't think that I've really seen the latter form used, but (IMHO) it makes for an easier to read regex (I don't need to try to figure out which whitespace has been escaped, and what whitespace is being ignored... etc. etc.) and my comments get formatted by my text editor as comments. Is there a reason to prefer the former over the latter or visa-verse? Or is this really a tomato-Tomato issue?
There is a difference between the use of both functions. Both return the first match of a substring found in the string, but re. match() searches only from the beginning of the string and return match object if found.
The re. finditer() works exactly the same as the re. findall() method except it returns an iterator yielding match objects matching the regex pattern in a string instead of a list. It scans the string from left to right, and matches are returned in the iterator form.
re.X. re.VERBOSE. Allow comment in the regex. This flag is useful to make regex more readable by allowing comments in the regex.
The former can be put in a text file of its own and then loaded without resort to literal_eval
. With complicated REs (or a choice of multiple different REs), that may be a benefit.
I'd say it's tomato-tomahto. The "x" regular expression flag is not exclusive to python and may make more sense in languages where the concatenation operation is more verbose (imagine +
everywhere adding to the noise).
I also consider the fact that it enforces you to properly indicate which whitespace is part of the expression as a positive thing as it removes any ambiguity and makes it hard to miss quirks in regexen that are dependent on whitespace.
One final argument is that you can copy the exact pattern over to another language that has the same flag and it will work without much effort. In the latter case, I'd have to remove a lot of r
s and apostrophes.
By the way you could always concatenate with the re.X
option.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With