Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx failing since PHP 7.4, working in 7.3

Any ideas why this preg_match works up to PHP7.2 but fails with 7.3+ ?

$word = 'umweltfreundilch'; //real life example :/
preg_match('/^(?U)(.*(?:[aeiouyäöü])(?:[^aeiouyäöü]))(?X)(.*)$/u', $word, $matches);
var_dump($matches);

Warning: preg_match(): Compilation failed: unrecognized character after (? or (?-

PHP 7.2 and below output:

array(3) {
  [0]=>
  string(16) "umweltfreundilch"
  [1]=>
  string(2) "um"
  [2]=>
  string(14) "weltfreundilch"
}

RegEx seems to be ok, doesn't it?
https://regex101.com/r/LGdhaM/1

like image 601
mgherkins Avatar asked Sep 10 '20 08:09

mgherkins


People also ask

Does PHP support regex?

Like PHP, many other programming languages have their own implementation of regular expressions. This is the same with other applications also, which have their own support of regexes having various syntaxes.

Is PHP Version 7.4 stable?

PHP 7.4 is the latest stable version of PHP. It was released on November 28, 2019 and it's the last version before PHP 8. It brings lots of new features, syntax additions and fixes.

What does Preg_match mean in PHP?

preg_match() in PHP – this function is used to perform pattern matching in PHP on a string. It returns true if a match is found and false if a match is not found. preg_split() in PHP – this function is used to perform a pattern match on a string and then split the results into a numeric array.

What does ++ mean in regex?

++ From What is double plus in regular expressions? That's a Possessive Quantifier. It basically means that if the regex engine fails matching later, it will not go back and try to undo the matches it made here.


Video Answer


1 Answers

In PHP 7.3 and later, the Perl-Compatible Regular Expressions (PCRE) extension was upgraded to PCRE2.

The PCRE2 syntax documentation does not list (?X) as an available inline modifier option. Here are the supported options:

  (?i)            caseless
  (?J)            allow duplicate named groups
  (?m)            multiline
  (?n)            no auto capture
  (?s)            single line (dotall)
  (?U)            default ungreedy (lazy)
  (?x)            extended: ignore white space except in classes
  (?xx)           as (?x) but also ignore space and tab in classes
  (?-...)         unset option(s)
  (?^)            unset imnsx options

However, you may actually use X flag after the trailing delimiter:

preg_match('/^(?U)(.*[aeiouyäöü][^aeiouyäöü])(.*)$/Xu', $word, $matches)

See PHP 7.4 demo.

To cancel (?U) effect, you may use either of the two options: a (?-U) inline modifier, like in

preg_match('/^(?U)(.*[aeiouyäöü][^aeiouyäöü])(?-U)(.*)$/u', $word, $matches);
//                                           ^^^^^

Or, enclose the affected patterns into a (?U:...) modifier group:

preg_match('/^(?U:(.*[aeiouyäöü][^aeiouyäöü]))(.*)$/u', $word, $matches);
//            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^        

See more about changes to regex handling in PHP 7.3+ in preg_match(): Compilation failed: invalid range in character class at offset.

like image 178
Wiktor Stribiżew Avatar answered Oct 10 '22 09:10

Wiktor Stribiżew