Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why closing square bracket "]" doesn't require escaping in regex?

Consider the array:

new Pattern[] {Pattern.compile("\\["),Pattern.compile("\\]") };

Intellij IDEA tells me that \\ is redundant and tells me to replace this with ] e.g. the result is:

new Pattern[] {Pattern.compile("\\["),Pattern.compile("]") };

Why in the first Pattern.compile("\\[") is the \\ OK, but for the second it is redundant?

like image 978
Cherry Avatar asked Mar 27 '17 11:03

Cherry


People also ask

Do square brackets need to be escaped?

Use square brackets as escape characters for the percent sign, the underscore, and the left bracket. The right bracket does not need an escape character; use it by itself. If you use the hyphen as a literal character, it must be the first character inside a set of square brackets.

Do brackets need to be escaped in regex?

Do brackets need to be escaped in regex? Although dot ( . ) has special meaning in regex, in a character class (square brackets) any characters except ^ , – , ] or \ is a literal, and do not require escape sequence.

How do you escape square brackets in regex?

An escape can be either enclosing the phrase in braces, or placing a backslash before the escaped character. To pass a left bracket to the regular expression parser to evaluate as a range of characters takes 1 escape.

What does square brackets mean in regex?

A string enclosed in square brackets matches any one character in the string. 1. For example, regular expression [abc] matches a , b , or c . Within bracket_expression, certain characters have special meanings, as follows: 2.


1 Answers

The ] symbol is not a special regex operator outside the character class if there is no corresponding unescaped [ before it. Only special characters require escaping. A [ is a special regex operator outside a character class (as it may mark the starting point of a character class). Once the Java regular expression engine sees an unescaped [ in the pattern, it knows there must be a ] to close the character class ahead. Whether it is escaped or not, it does not matter for the engine. If there is no opening [ in the expression, the ] is treated as a mere literal ] symbol. So, [abc] will match a, b or c, and \[abc] or \[abc\] will match [abc] literal character sequence.

So, the [ should be escaped always, and ] does not have to be escaped outside a character class.

When used inside a character class, both [ and ] must be escaped inside a Java regular expression as they may form intersection/subtraction patterns, unless the ] appears at the beginning of a character class (i.e. "[a]".replaceAll("[]\\[]", "") returns a).

Other regex flavors

icu onigmo - In ICU and Onigmo regex flavor, ] behaves the same as in Java regex flavor. Languages affected: swift, ruby, r (stringr), kotlin, groovy.

pcre boost .net re2 python posix - In Boost, PCRE, ] is not a special char (i.e. needs no escaping) outside a character class, and is a special char (=needs escaping) inside a character class (where it does not need escaping only if it is the first char in the character class.) It is not an error to escape it everywhere where it is supposted to match a literal ] char. Languages/tools affected: php, perl, c#/vb.net/etc., python, sed, grep, awk, elixir, r (both default base R TRE and PCRE enabled with "perl=TRUE"), tcl, google-sheets.

ecmascript - In ECMAScript flavors, ] is not special outside a character class, while [ is special outside a character class. Inside a character class, ] must ALWAYS be escaped, even if it is the first char in the character class. [ inside a character class is not special, but escaping it is an error if the regexp is compiled with the /u flag (in JavaScript). So, be careful here. Languages affected: javascript, dart, c++, vba, google-apps-script (which uses JavaScript).

like image 104
Wiktor Stribiżew Avatar answered Oct 08 '22 02:10

Wiktor Stribiżew