Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the "[^][]" regex mean?

Tags:

regex

php

I found it in the following regex:

\[(?:[^][]|(?R))*\] 

It matches square brackets (with their content) together with nested square brackets.

like image 869
Emanuil Rusev Avatar asked Jul 24 '13 21:07

Emanuil Rusev


People also ask

What does ?= In regex mean?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

What does the metacharacter \d means in regular expression?

The \D metacharacter matches non-digit characters.


1 Answers

[^][] is a character class that means all characters except [ and ].

You can avoid escaping [ and ] special characters since it is not ambiguous for the PCRE, the regex engine used in preg_ functions.

Since [^] is incorrect in PCRE, the only way for the regex to parse is that ] is inside the character class which will be closed later. The same with the [ that follows. It can not reopen a character class (except a POSIX character class [:alnum:]) inside a character class. Then the last ] is clear; it is the end of the character class. However, a [ outside a character class must be escaped since it is parsed as the beginning of a character class.

In the same way, you can write []] or [[] or [^[] without escaping the [ or ] in the character class.

Note: since PHP 7.3, you can use the inline xx modifier that allows blank characters to be ignored even inside character classes. This way you can write these classes in a less ambigous way like that: (?xx) [^ ][ ] [ ] ] [ [ ] [^ [ ].

You can use this syntax with several regex flavour: PCRE (PHP, R), Perl, Python, Java, .NET, GO, awk, Tcl (if you delimit your pattern with curly brackets, thanks Donal Fellows), ...

But not with: Ruby, JavaScript (except for IE < 9), ...

As m.buettner noted, [^]] is not ambiguous because ] is the first character, [^a]] is seen as all that is not a a followed by a ]. To have a and ], you must write: [^a\]] or [^]a]

In particular case of JavaScript, the specification allow [] as a regex token that never matches (in other words, [] will always fail) and [^] as a regex that matches any character. Then [^]] is seen as any character followed by a ]. The actual implementation varies, but modern browser generally sticks to the definition in the specification.

Pattern details:

\[          # literal [ (?:         # open a non capturing group     [^][]   # a character that is not a ] or a [   |         # OR     (?R)    # the whole pattern (here is the recursion) )*          # repeat zero or more time \]          # a literal ] 

In your pattern example, you don't need to escape the last ]

But you can do the same with this pattern a little bit optimized, and more useful cause reusable as subpattern (with the (?-1)): (\[(?:[^][]+|(?-1))*+])

(                     # open the capturing group     \[                # a literal [         (?:           # open a non-capturing group             [^][]+    # all characters but ] or [ one or more time           |           # OR             (?-1)     # the last opened capturing group (recursion)                       # (the capture group where you are)         )*+           # repeat the group zero or more time (possessive)     ]                 # literal ] (no need to escape) )                     # close the capturing group 

or better: (\[[^][]*(?:(?-1)[^][]*)*+]) that avoids the cost of an alternation.

like image 132
Casimir et Hippolyte Avatar answered Oct 05 '22 14:10

Casimir et Hippolyte