What does `?` mean in this Perl regex?




I have a Perl regex. But I'm not sure what "?" means in this context.


What does ? mean here?

2 Answers

In this case, the ? is actually being used in connection with the :. Put together, ?: at the beginning of a grouping means to group but not capture the text/pattern within the parentheses (as in, it will not be stored in any backreferences like \1 or $1, so you will not be able to access the grouped text directly).

More specifically, a ? has three distinct meanings in regex:

  1. The ? quantifier signifies "zero or one repetitions" of an expression. One of the canonical examples I've seen is s?he which will match both she and he since the ? makes the s "optional"

  2. When a quantifier (+, *, ?, or the general {n,m}) is followed by a ? then the match is non-greedy (i.e. it will match the shortest string starting from that position that allows the match to proceed)

  3. A ? at the beginning of a parenthesized group signifies that you want to perform a special action. As in this case, : means to group but not capture. The exact list of actions available will vary somewhat from one regex engine to another, but here's a list (not necessarily all-inclusive) of some of them:

    A. Non-capturing group: (?:text)
    B. Lookaround: (?=a) for a lookahead, ?! for negative lookahead, or ?<= and ?<! for lookbehinds (positive and negative, respectively).
    C. Conditional Matches: (?(condition)then|else).
    D. Atomic Grouping: a(?>bc|b)c (matches abcc but not abc; see the link)
    E. Inline enabling/disabling of regex matching modifiers: ?i to enable a mode, ?-i to disable. You can also enable/disable more than one modifier at a time by simply concatenating them, such as ?im (i is case insensitive and m is multiline).
    F. Named capture groups: (?P<name>pattern), which can later be referenced using (?P=name). The .NET regex engine uses the syntax (?<name>pattern) instead.
    G. Comments: (?#Comment text). I personally think this just adds clutter, but I guess it could serve some use...free-spacing mode might be a better option (the (?x) modifier).

So essentially, the purpose of the ? is just contextual. If you wanted zero or more repetitions of a literal ( character you'd have to use \(? to escape the paren.

$ perldoc perlreref:

(?:...) Groups subexpressions without capturing (cluster)

You can also use YAPE::Regex::Explain:

C:\\Temp> perl -MYAPE::Regex::Explain -e \ 
"print YAPE::Regex::Explain->new(qr#(?:\w+)#)->explain"

The regular expression:


matches as follows:

NODE                     EXPLANATION
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
  (?:                      group, but do not capture:
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
  )                        end of grouping
)                        end of grouping
