Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Warning: preg_replace(): Unknown modifier

I have the following error:

Warning: preg_replace(): Unknown modifier ']' in xxx.php on line 38

This is the code on line 38:

<?php echo str_replace("</ul></div>", "", preg_replace("<div[^>]*><ul[^>]*>", "", wp_nav_menu(array('theme_location' => 'nav', 'echo' => false)) )); ?>

How can I fix this problem?

like image 863
user3122995 Avatar asked Dec 20 '13 14:12

user3122995


2 Answers

Why the error occurs

In PHP, a regular expression needs to be enclosed within a pair of delimiters. A delimiter can be any non-alphanumeric, non-backslash, non-whitespace character; /, #, ~ are the most commonly used ones. Note that it is also possible to use bracket style delimiters where the opening and closing brackets are the starting and ending delimiter, i.e. <pattern_goes_here>, [pattern_goes_here] etc. are all valid.

The "Unknown modifier X" error usually occurs in the following two cases:

  • When your regular expression is missing delimiters.

  • When you use the delimiter inside the pattern without escaping it.

In this case, the regular expression is <div[^>]*><ul[^>]*>. The regex engine considers everything from < to > as the regex pattern, and everything afterwards as modifiers.

Regex: <div[^>  ]*><ul[^>]*>
       │     │  │          │
       └──┬──┘  └────┬─────┘
       pattern    modifiers

] here is an unknown modifier, because it appears after the closing > delimiter. Which is why PHP throws that error.

Depending on the pattern, the unknown modifier complaint might as well have been about *, +, p, / or ) or almost any other letter/symbol. Only imsxeADSUXJu are valid PCRE modifiers.

How to fix it

The fix is easy. Just wrap your regex pattern with any valid delimiters. In this case, you could chose ~ and get the following:

~<div[^>]*><ul[^>]*>~
│                   │
│                   └─ ending delimiter
└───────────────────── starting delimiter

If you're receiving this error despite having used a delimiter, it might be because the pattern itself contains unescaped occurrences of the said delimiter.

Or escape delimiters

/foo[^/]+bar/i would certainly throw an error. So you can escape it using a \ backslash if it appears anywhere within the regex:

/foo[^\/]+bar/i
│      │     │
└──────┼─────┴─ actual delimiters
       └─────── escaped slash(/) character

This is a tedious job if your regex pattern contains so many occurrences of the delimiter character.

The cleaner way, of course, would be to use a different delimiter altogether. Ideally a character that does not appear anywhere inside the regex pattern, say # - #foo[^/]+bar#i.

More reading:

  • PHP regex delimiters
  • http://www.regular-expressions.info/php.html
  • How can I convert ereg expressions to preg in PHP? (missing delimiters)
  • Unknown modifier '/' in …? what is it? (on using preg_quote())
like image 78
Amal Murali Avatar answered Oct 02 '22 17:10

Amal Murali


Other examples

The reference answer already explains the reason for "Unknown modifier" warnings. This is just a comparison of other typical variants.

  • When forgetting to add regex /delimiters/, the first non-letter symbol will be assumed to be one. Therefore the warning is often about what follows a grouping (…), […] meta symbol:

    preg_match("[a-zA-Z]+:\s*.$"
                ↑      ↑⬆
    
  • Sometimes your regex already uses a custom delimiter (: here), but still contains the same character as unescaped literal. It's then mistaken as premature delimiter. Which is why the very next symbol receives the "Unknown modifier ❌" trophy:

    preg_match(":\[[\d:/]+\]:"
                ↑     ⬆     ↑
    
  • When using the classic / delimiter, take care to not have it within the regex literally. This most frequently happens when trying to match unescaped filenames:

    preg_match("/pathname/filename/i"
                ↑        ⬆         ↑
    

    Or when matching angle/square bracket style tags:

    preg_match("/<%tmpl:id>(.*)</%tmpl:id>/Ui"
                ↑               ⬆         ↑
    
  • Templating-style (Smarty or BBCode) regex patterns often require {…} or […] brackets. Both should usually be escaped. (An outermost {} pair being the exception though).

    They also get misinterpreted as paired delimiters when no actual delimiter is used. If they're then also used as literal character within, then that's, of course … an error.

    preg_match("{bold[^}]+}"
                ↑      ⬆  ↑
    
  • Whenever the warning says "Delimiter must not be alphanumeric or backslash" then you also entirely forgot delimiters:

    preg_match("ab?c*"
                ↑
    
  • "Unkown modifier 'g'" often indicates a regex that was copied verbatimly from JavaScript or Perl.

    preg_match("/abc+/g"
                      ⬆
    

    PHP doesn't use the /g global flag. Instead the preg_replace function works on all occurences, and preg_match_all is the "global" searching pendant to the one-occurence preg_match.

    So, just remove the /g flag.

    See also:
    · Warning: preg_replace(): Unknown modifier 'g'
    · preg_replace: bad regex == 'Unknown Modifier'?

  • A more peculiar case pertains the PCRE_EXTENDED /x flag. This is often (or should be) used for making regexps more lofty and readable.

    This allows to use inline # comments. PHP implements the regex delimiters atop PCRE. But it doesn't treat # in any special way. Which is how a literal delimiter in a # comment can become an error:

    preg_match("/
       ab?c+  # Comment with / slash in between
    /x"
    

    (Also noteworthy that using # as #abc+#x delimiter can be doubly inadvisable.)

  • Interpolating variables into a regex requires them to be pre-escaped, or be valid regexps themselves. You can't tell beforehand if this is gonna work:

     preg_match("/id=$var;/"
                 ↑    ↺   ↑
    

    It's best to apply $var = preg_quote($var, "/") in such cases.

    See also:
    · Unknown modifier '/' in ...? what is it?

    Another alternative is using \Q…\E escapes for unquoted literal strings:

     preg_match("/id=\Q{$var}\E;/mix");
    

    Note that this is merely a convenience shortcut for meta symbols, not dependable/safe. It would fall apart in case that $var contained a literal '\E' itself (however unlikely). And it does not mask the delimiter itself.

  • Deprecated modifier /e is an entirely different problem. This has nothing to do with delimiters, but the implicit expression interpretation mode being phased out. See also: Replace deprecated preg_replace /e with preg_replace_callback

Alternative regex delimiters

As mentioned already, the quickest solution to this error is just picking a distinct delimiter. Any non-letter symbol can be used. Visually distinctive ones are often preferred:

  • ~abc+~
  • !abc+!
  • @abc+@
  • #abc+#
  • =abc+=
  • %abc+%

Technically you could use $abc$ or |abc| for delimiters. However, it's best to avoid symbols that serve as regex meta characters themselves.

The hash # as delimiter is rather popular too. But care should be taken in combination with the x/PCRE_EXTENDED readability modifier. You can't use # inline or (?#…) comments then, because those would be confused as delimiters.

Quote-only delimiters

Occassionally you see " and ' used as regex delimiters paired with their conterpart as PHP string enclosure:

  preg_match("'abc+'"
  preg_match('"abc+"'

Which is perfectly valid as far as PHP is concerned. It's sometimes convenient and unobtrusive, but not always legible in IDEs and editors.

Paired delimiters

An interesting variation are paired delimiters. Instead of using the same symbol on both ends of a regex, you can use any <...> (...) [...] {...} bracket/braces combination.

  preg_match("(abc+)"   # just delimiters here, not a capture group

While most of them also serve as regex meta characters, you can often use them without further effort. As long as those specific braces/parens within the regex are paired or escaped correctly, these variants are quite readable.

Fancy regex delimiters

A somewhat lazy trick (which is not endorsed hereby) is using non-printable ASCII characters as delimiters. This works easily in PHP by using double quotes for the regex string, and octal escapes for delimiters:

 preg_match("\001 abc+ \001mix"

The \001 is just a control character that's not usually needed. Therefore it's highly unlikely to appear within most regex patterns. Which makes it suitable here, even though not very legible.

Sadly you can't use Unicode glyps as delimiters. PHP only allows single-byte characters. And why is that? Well, glad you asked:

PHPs delimiters atop PCRE

The preg_* functions utilize the PCRE regex engine, which itself doesn't care or provide for delimiters. For resemblence with Perl the preg_* functions implement them. Which is also why you can use modifier letters /ism instead of just constants as parameter.

See ext/pcre/php_pcre.c on how the regex string is preprocessed:

  • First all leading whitespace is ignored.

  • Any non-alphanumeric symbol is taken as presumed delimiter. Note that PHP only honors single-byte characters:

    delimiter = *p++;
    if (isalnum((int)*(unsigned char *)&delimiter) || delimiter == '\\') {
            php_error_docref(NULL,E_WARNING, "Delimiter must not…");
            return NULL;
    }
    
  • The rest of the regex string is traversed left-to-right. Only backslash \\-escaped symbols are ignored. \Q and \E escaping is not honored.

  • Should the delimiter be found again, the remainder is verified to only contain modifier letters.

  • If the delimiter is one of the ([{< )]}> )]}> pairable braces/brackets, then the processing logic is more elaborate.

    int brackets = 1;   /* brackets nesting level */
    while (*pp != 0) {
            if (*pp == '\\' && pp[1] != 0) pp++;
            else if (*pp == end_delimiter && --brackets <= 0)
                    break;
            else if (*pp == start_delimiter)
                    brackets++;
            pp++;
    }
    

    It looks for correctly paired left and right delimiter, but ignores other braces/bracket types when counting.

  • The raw regex string is passed to the PCRE backend only after delimiter and modifier flags have been cut out.

Now this is all somewhat irrelevant. But explains where the delimiter warnings come from. And this whole procedure is all to have a minimum of Perl compatibility. There are a few minor deviations of course, like the […] character class context not receiving special treatment in PHP.

More references

  • preg_match(); - Unknown modifier '+'
  • Unknown modifier '/' error in PHP
  • PHP RegExpr error Unkown modifier '('
  • Unknown modifier '(' when using preg_match() with a REGEX expression
  • PHP: Regex - Unknown modifier error
  • Warning: preg_match() [function.preg-match]: Unknown modifier '('
  • When does preg_match(): Unknown modifier error occur?
    (Just a well-written question demonstrating prior research)
like image 34
8 revs, 2 users 96% Avatar answered Oct 02 '22 17:10

8 revs, 2 users 96%