I found it in the following regex: <pre class="prettyprint"><code>\[(?:[^][]|(?R))*\] </code></pre> It matches square brackets (with their content) together with nested square brackets.

<code>[^][]</code> is a character class that means all characters except <code>[</code> and <code>]</code>. You can avoid escaping <code>[</code> and <code>]</code> special characters since it is not ambiguous for the PCRE, the regex engine used in <code>preg_</code> functions. Since <code>[^]</code> is incorrect in PCRE, the only way for the regex to parse is that <code>]</code> is inside the character class which will be closed later. The same with the <code>[</code> that follows. It can not reopen a character class (except a POSIX character class <code>[:alnum:]</code>) inside a character class. Then the last <code>]</code> is clear; it is the end of the character class. However, a <code>[</code> outside a character class must be escaped since it is parsed as the beginning of a character class. In the same way, you can write <code>[]]</code> or <code>[[]</code> or <code>[^[]</code> without escaping the <code>[</code> or <code>]</code> in the character class. Note: since PHP 7.3, you can use the inline xx modifier that allows blank characters to be ignored even inside character classes. This way you can write these classes in a less ambigous way like that: <code>(?xx) [^ ][ ] [ ] ] [ [ ] [^ [ ]</code>. You can use this syntax with several regex flavour: PCRE (PHP, R), Perl, Python, Java, .NET, GO, awk, Tcl (if you delimit your pattern with curly brackets, thanks Donal Fellows), ... But not with: Ruby, JavaScript (except for IE < 9), ... As m.buettner noted, <code>[^]]</code> is not ambiguous because <code>]</code> is the first character, <code>[^a]]</code> is seen as all that is not a <code>a</code> followed by a <code>]</code>. To have <code>a</code> and <code>]</code>, you must write: <code>[^a\]]</code> or <code>[^]a]</code> In particular case of JavaScript, the specification allow <code>[]</code> as a regex token that never matches (in other words, <code>[]</code> will always fail) and <code>[^]</code> as a regex that matches any character. Then <code>[^]]</code> is seen as any character followed by a <code>]</code>. The actual implementation varies, but modern browser generally sticks to the definition in the specification. Pattern details: <pre class="prettyprint"><code>\[ # literal [ (?: # open a non capturing group [^][] # a character that is not a ] or a [ | # OR (?R) # the whole pattern (here is the recursion) )* # repeat zero or more time \] # a literal ] </code></pre> In your pattern example, you don't need to escape the last <code>]</code> But you can do the same with this pattern a little bit optimized, and more useful cause reusable as subpattern (with the <code>(?-1)</code>): <code>(\[(?:[^][]+|(?-1))*+])</code> <pre class="prettyprint"><code>( # open the capturing group \[ # a literal [ (?: # open a non-capturing group [^][]+ # all characters but ] or [ one or more time | # OR (?-1) # the last opened capturing group (recursion) # (the capture group where you are) )*+ # repeat the group zero or more time (possessive) ] # literal ] (no need to escape) ) # close the capturing group </code></pre> or better: <code>(\[[^][]*(?:(?-1)[^][]*)*+])</code> that avoids the cost of an alternation.

What does the "[^][]" regex mean?

Tags:

regex

php

I found it in the following regex:

\[(?:[^][]|(?R))*\]

It matches square brackets (with their content) together with nested square brackets.

869

asked Jul 24 '13 21:07

Emanuil Rusev

1 Answers

[^][] is a character class that means all characters except [ and ].

You can avoid escaping [ and ] special characters since it is not ambiguous for the PCRE, the regex engine used in preg_ functions.

Since [^] is incorrect in PCRE, the only way for the regex to parse is that ] is inside the character class which will be closed later. The same with the [ that follows. It can not reopen a character class (except a POSIX character class [:alnum:]) inside a character class. Then the last ] is clear; it is the end of the character class. However, a [ outside a character class must be escaped since it is parsed as the beginning of a character class.

In the same way, you can write []] or [[] or [^[] without escaping the [ or ] in the character class.

Note: since PHP 7.3, you can use the inline xx modifier that allows blank characters to be ignored even inside character classes. This way you can write these classes in a less ambigous way like that: (?xx) [^ ][ ] [ ] ] [ [ ] [^ [ ].

You can use this syntax with several regex flavour: PCRE (PHP, R), Perl, Python, Java, .NET, GO, awk, Tcl (if you delimit your pattern with curly brackets, thanks Donal Fellows), ...

But not with: Ruby, JavaScript (except for IE < 9), ...

As m.buettner noted, [^]] is not ambiguous because ] is the first character, [^a]] is seen as all that is not a a followed by a ]. To have a and ], you must write: [^a\]] or [^]a]

In particular case of JavaScript, the specification allow [] as a regex token that never matches (in other words, [] will always fail) and [^] as a regex that matches any character. Then [^]] is seen as any character followed by a ]. The actual implementation varies, but modern browser generally sticks to the definition in the specification.

Pattern details:

\[          # literal [ (?:         # open a non capturing group     [^][]   # a character that is not a ] or a [   |         # OR     (?R)    # the whole pattern (here is the recursion) )*          # repeat zero or more time \]          # a literal ]

In your pattern example, you don't need to escape the last ]

But you can do the same with this pattern a little bit optimized, and more useful cause reusable as subpattern (with the (?-1)): (\[(?:[^][]+|(?-1))*+])

(                     # open the capturing group     \[                # a literal [         (?:           # open a non-capturing group             [^][]+    # all characters but ] or [ one or more time           |           # OR             (?-1)     # the last opened capturing group (recursion)                       # (the capture group where you are)         )*+           # repeat the group zero or more time (possessive)     ]                 # literal ] (no need to escape) )                     # close the capturing group

or better: (\[[^][]*(?:(?-1)[^][]*)*+]) that avoids the cost of an alternation.

132

answered Oct 05 '22 14:10

Casimir et Hippolyte

Related questions
                            
                                Does PHP have an answer to Java style class generics?
                            
                                Best practice for working with currency values in PHP?
                            
                                Is filter_var a good way to go?
                            
                                How to use Eloquent ORM without Laravel?
                            
                                what is the difference between X-XSRF-TOKEN and X-CSRF-TOKEN?
                            
                                What's more efficient - storing logs in sql database or files?
                            
                                Install package on non-empty folder using composer
                            
                                php method argument type hinting with question mark (?type)
                            
                                Zend Framework 2 + Doctrine 2 [closed]
                            
                                PHP using preg_replace : "Delimiter must not be alphanumeric or backslash" error
                            
                                Type hinting in class variables
                            
                                PHP PDO - What do $dbh and $sth stand for?
                            
                                PHP include(): File size & performance
                            
                                \n vs. PHP_EOL vs. <br>?
                            
                                Automatic PHP Documentation Generation? [closed]
                            
                                InvalidArgumentException vs UnexpectedValueException
                            
                                What is the difference between destroy() and delete() methods in Laravel?
                            
                                How do I sort a PHP array by an element nested inside?
                            
                                Laravel: Access Model instance in Form Request when using Route/Model binding
                            
                                How to restart php-fpm inside a docker container?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With