Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List of all special characters that need to be escaped in a regex

Tags:

java

regex

People also ask

What characters have to be escaped regex?

Operators: * , + , ? , | Anchors: ^ , $ Others: . , \ In order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped.

Do I need to escape period in regex?

(dot) metacharacter, and can match any single character (letter, digit, whitespace, everything). You may notice that this actually overrides the matching of the period character, so in order to specifically match a period, you need to escape the dot by using a slash \. accordingly.


You can look at the javadoc of the Pattern class: http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

You need to escape any char listed there if you want the regular char and not the special meaning.

As a maybe simpler solution, you can put the template between \Q and \E - everything between them is considered as escaped.


  • Java characters that have to be escaped in regular expressions are:
    \.[]{}()<>*+-=!?^$|
  • Two of the closing brackets (] and }) are only need to be escaped after opening the same type of bracket.
  • In []-brackets some characters (like + and -) do sometimes work without escape.

To escape you could just use this from Java 1.5:

Pattern.quote("$test");

You will match exacty the word $test


According to the String Literals / Metacharacters documentation page, they are:

<([{\^-=$!|]})?*+.>

Also it would be cool to have that list refereed somewhere in code, but I don't know where that could be...


Combining what everyone said, I propose the following, to keep the list of characters special to RegExp clearly listed in their own String, and to avoid having to try to visually parse thousands of "\\"'s. This seems to work pretty well for me:

final String regExSpecialChars = "<([{\\^-=$!|]})?*+.>";
final String regExSpecialCharsRE = regExSpecialChars.replaceAll( ".", "\\\\$0");
final Pattern reCharsREP = Pattern.compile( "[" + regExSpecialCharsRE + "]");

String quoteRegExSpecialChars( String s)
{
    Matcher m = reCharsREP.matcher( s);
    return m.replaceAll( "\\\\$0");
}

On @Sorin's suggestion of the Java Pattern docs, it looks like chars to escape are at least:

\.[{(*+?^$|

although the answer is for Java, but the code can be easily adapted from this Kotlin String extension I came up with (adapted from that @brcolow provided):

private val escapeChars = charArrayOf(
    '<',
    '(',
    '[',
    '{',
    '\\',
    '^',
    '-',
    '=',
    '$',
    '!',
    '|',
    ']',
    '}',
    ')',
    '?',
    '*',
    '+',
    '.',
    '>'
)

fun String.escapePattern(): String {
    return this.fold("") {
      acc, chr ->
        acc + if (escapeChars.contains(chr)) "\\$chr" else "$chr"
    }
}

fun main() {
    println("(.*)".escapePattern())
}

prints \(\.\*\)

check it in action here https://pl.kotl.in/h-3mXZkNE