I have a Perl regex. But I'm not sure what "?" means in this context.
m#(?:\w+)#
What does ?
mean here?
In this case, the ?
is actually being used in connection with the :
. Put together, ?:
at the beginning of a grouping means to group but not capture the text/pattern within the parentheses (as in, it will not be stored in any backreferences like \1
or $1
, so you will not be able to access the grouped text directly).
More specifically, a ?
has three distinct meanings in regex:
The ?
quantifier signifies "zero or one repetitions" of an expression. One of the canonical examples I've seen is s?he
which will match both she
and he
since the ?
makes the s
"optional"
When a quantifier (+
, *
, ?
, or the general {n,m}
) is followed by a ?
then the match is non-greedy (i.e. it will match the shortest string starting from that position that allows the match to proceed)
A ?
at the beginning of a parenthesized group signifies that you want to perform a special action. As in this case, :
means to group but not capture. The exact list of actions available will vary somewhat from one regex engine to another, but here's a list (not necessarily all-inclusive) of some of them:
A. Non-capturing group: (?:text)
B. Lookaround: (?=a)
for a lookahead, ?!
for negative lookahead, or ?<=
and ?<!
for lookbehinds (positive and negative, respectively).
C. Conditional Matches: (?(condition)then|else)
.
D. Atomic Grouping: a(?>bc|b)c
(matches abcc
but not abc
; see the link)
E. Inline enabling/disabling of regex matching modifiers: ?i
to enable a mode, ?-i
to disable. You can also enable/disable more than one modifier at a time by simply concatenating them, such as ?im
(i
is case insensitive and m
is multiline).
F. Named capture groups: (?P<name>pattern)
, which can later be referenced using (?P=name)
. The .NET regex engine uses the syntax (?<name>pattern)
instead.
G. Comments: (?#Comment text)
. I personally think this just adds clutter, but I guess it could serve some use...free-spacing mode might be a better option (the (?x)
modifier).
So essentially, the purpose of the ?
is just contextual. If you wanted zero or more repetitions of a literal (
character you'd have to use \(?
to escape the paren.
$ perldoc perlreref:
(?:...)
Groups subexpressions without capturing (cluster)
You can also use YAPE::Regex::Explain:
C:\\Temp> perl -MYAPE::Regex::Explain -e \ "print YAPE::Regex::Explain->new(qr#(?:\w+)#)->explain" The regular expression: (?-imsx:(?:\w+)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With