Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deprecated left curly bracket in Perl regex - exactly when?

Tags:

regex

perl

perldoc perlre says this:

(If a curly bracket occurs in any other context and does not form part of a backslashed sequence like \x{...}, it is treated as a regular character. However, a deprecation warning is raised for all such occurrences, and in Perl v5.26, literal uses of a curly bracket will be required to be escaped, say by preceding them with a backslash ("\{") or enclosing them within square brackets ("[{]"). This change will allow for future syntax extensions (like making the lower bound of a quantifier optional), and better error checking of quantifiers.)

OK, so the following prints the deprecation message.

perl -lE 'm/x{x}/'

Why doesn't the following?

perl -lE 'm/x({x})/'

e.g. in the capture group is the { allowed unescaped? Probably not because

perl -lE 'm/x(x{x})/'

also prints the warning.

So, what is the exact "logic"?

P.S.: I will escape every literal {, but am wondering about the rationale behind the above.

like image 923
jm666 Avatar asked Jul 31 '15 19:07

jm666


1 Answers

The warning is only emitted when the curly:

  • isn't at the beginning of the pattern
  • follows an alphabetic character
  • is not part of a special escape sequence \b{}, \B{}, \g{}, \k{}, \N{}, \o{}, \p{}, \P{}, or \x{}
  • is not a part of a quantifier of the form {n}, {n,}, or {n,m}, where n and m are positive integers

See regcomp.c in the Perl source (the below is from 5.22.0):

        case '{':
            /* Currently we don't warn when the lbrace is at the start
             * of a construct.  This catches it in the middle of a
             * literal string, or when its the first thing after
             * something like "\b" */
            if (! SIZE_ONLY
                && (len || (p > RExC_start && isALPHA_A(*(p -1)))))
            {
                ckWARNregdep(p + 1, "Unescaped left brace in regex is deprecated, passed through");
            }
            /*FALLTHROUGH*/
        default:    /* A literal character */
          normal_default:
            if (UTF8_IS_START(*p) && UTF) {
                STRLEN numlen;
                ender = utf8n_to_uvchr((U8*)p, RExC_end - p,
                                       &numlen, UTF8_ALLOW_DEFAULT);
                p += numlen;
            }
            else
                ender = (U8) *p++;
            break;
        } /* End of switch on the literal */

Demo:

$ perl -e '/{/'    # Beginning of pattern, no warning

$ perl -e '/.{/'   # Doesn't follow alpha, no warning

$ perl -e '/x{3}/' # Valid quantifier, no warning

$ perl -e '/\x{/'  # Part of special escape sequence \x{}, different warning
Missing right brace on \x{} in regex; marked by <-- HERE in m/\x{ <-- HERE / at -e line 1.

$ perl -e '/x{/'   # Follows alpha, isn't a quantifier or special escape, warns
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/x{ <-- HERE / at -e line 1.
like image 121
ThisSuitIsBlackNot Avatar answered Nov 07 '22 08:11

ThisSuitIsBlackNot