Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python re.X vs automagic line continuation

Python provides a flag (re.X or re.VERBOSE) to allow annotation of regular expressions:

a = re.compile(r"""\d +  # the integral part
                   \.    # the decimal point
                   \d *  # some fractional digits""", re.X)

However, with automatic string concatenation, you could achieve basically the same thing:

a = re.compile(r'\d+' # integral part
               r'\.'  # decimal point
               r'\d*' # optional fractional digits
              )

I don't think that I've really seen the latter form used, but (IMHO) it makes for an easier to read regex (I don't need to try to figure out which whitespace has been escaped, and what whitespace is being ignored... etc. etc.) and my comments get formatted by my text editor as comments. Is there a reason to prefer the former over the latter or visa-verse? Or is this really a tomato-Tomato issue?

like image 813
mgilson Avatar asked Feb 08 '13 14:02

mgilson


People also ask

Is there any difference between re match () and re search () in the Python re module?

There is a difference between the use of both functions. Both return the first match of a substring found in the string, but re. match() searches only from the beginning of the string and return match object if found.

What does re Finditer do in Python?

The re. finditer() works exactly the same as the re. findall() method except it returns an iterator yielding match objects matching the regex pattern in a string instead of a list. It scans the string from left to right, and matches are returned in the iterator form.

What is the meaning of the flag re X *?

re.X. re.VERBOSE. Allow comment in the regex. This flag is useful to make regex more readable by allowing comments in the regex.


2 Answers

The former can be put in a text file of its own and then loaded without resort to literal_eval. With complicated REs (or a choice of multiple different REs), that may be a benefit.

like image 144
Fred Foo Avatar answered Sep 19 '22 19:09

Fred Foo


I'd say it's tomato-tomahto. The "x" regular expression flag is not exclusive to python and may make more sense in languages where the concatenation operation is more verbose (imagine + everywhere adding to the noise).

I also consider the fact that it enforces you to properly indicate which whitespace is part of the expression as a positive thing as it removes any ambiguity and makes it hard to miss quirks in regexen that are dependent on whitespace.

One final argument is that you can copy the exact pattern over to another language that has the same flag and it will work without much effort. In the latter case, I'd have to remove a lot of rs and apostrophes.


By the way you could always concatenate with the re.X option.

like image 38
Explosion Pills Avatar answered Sep 19 '22 19:09

Explosion Pills