Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does `?` mean in this Perl regex?

Tags:

regex

perl

I have a Perl regex. But I'm not sure what "?" means in this context.

m#(?:\w+)#

What does ? mean here?

like image 341
Nikita Avatar asked Oct 08 '10 13:10

Nikita


2 Answers

In this case, the ? is actually being used in connection with the :. Put together, ?: at the beginning of a grouping means to group but not capture the text/pattern within the parentheses (as in, it will not be stored in any backreferences like \1 or $1, so you will not be able to access the grouped text directly).

More specifically, a ? has three distinct meanings in regex:

  1. The ? quantifier signifies "zero or one repetitions" of an expression. One of the canonical examples I've seen is s?he which will match both she and he since the ? makes the s "optional"

  2. When a quantifier (+, *, ?, or the general {n,m}) is followed by a ? then the match is non-greedy (i.e. it will match the shortest string starting from that position that allows the match to proceed)

  3. A ? at the beginning of a parenthesized group signifies that you want to perform a special action. As in this case, : means to group but not capture. The exact list of actions available will vary somewhat from one regex engine to another, but here's a list (not necessarily all-inclusive) of some of them:

    A. Non-capturing group: (?:text)
    B. Lookaround: (?=a) for a lookahead, ?! for negative lookahead, or ?<= and ?<! for lookbehinds (positive and negative, respectively).
    C. Conditional Matches: (?(condition)then|else).
    D. Atomic Grouping: a(?>bc|b)c (matches abcc but not abc; see the link)
    E. Inline enabling/disabling of regex matching modifiers: ?i to enable a mode, ?-i to disable. You can also enable/disable more than one modifier at a time by simply concatenating them, such as ?im (i is case insensitive and m is multiline).
    F. Named capture groups: (?P<name>pattern), which can later be referenced using (?P=name). The .NET regex engine uses the syntax (?<name>pattern) instead.
    G. Comments: (?#Comment text). I personally think this just adds clutter, but I guess it could serve some use...free-spacing mode might be a better option (the (?x) modifier).

So essentially, the purpose of the ? is just contextual. If you wanted zero or more repetitions of a literal ( character you'd have to use \(? to escape the paren.

like image 163
eldarerathis Avatar answered Nov 15 '22 07:11

eldarerathis


$ perldoc perlreref:

(?:...) Groups subexpressions without capturing (cluster)

You can also use YAPE::Regex::Explain:

C:\\Temp> perl -MYAPE::Regex::Explain -e \ 
"print YAPE::Regex::Explain->new(qr#(?:\w+)#)->explain"

The regular expression:

(?-imsx:(?:\w+))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
like image 29
Sinan Ünür Avatar answered Nov 15 '22 08:11

Sinan Ünür