I'm trying to learn Atom's syntax highlighting/grammar rules, which heavily use JS regular expressions, and came across an unfamiliar pattern in the python grammar file.
The pattern starts with a (?x)
which is an unfamiliar regex to me. I looked it up in an online regex tester, which seems to say that it's invalid. My initial thought was it represents an optional left paren, but I believe the paren should be escaped here.
Does this only have meaning in the Atom's coffeescript grammar, or am I overlooking a regex meaning?
(This pattern also appear in the textmate language file that I believe Atom's came from).
A regular expression followed by a question mark (?) matches zero or one occurrences of the regular expression. Two regular expressions concatenated match an occurrence of the first followed by an occurrence of the second.
Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself.
But if you want to search a question mark, you need to “escape” the regex interpretation of the question mark. You accomplish this by putting a backslash just before the quesetion mark, like this: \? If you want to match the period character, escape it by adding a backslash before it.
The question mark gives the regex engine two choices: try to match the part the question mark applies to, or do not try to match it. The engine always tries to match that part. Only if this causes the entire regular expression to fail, will the engine try ignoring the part the question mark applies to.
If that regular expression gets processed in Python, it'll be compiled with the 'verbose' flag.
From the Python re
docs:
(?aiLmsux)
(One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.) The group matches the empty string; the letters set the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the entire regular expression. (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With