Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Escaping special characters in Java Regular Expressions

Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?

This would be very handy in dynamically building a regular expression, without having to manually escape each individual character.

For example, consider a simple regex like \d+\.\d+ that matches numbers with a decimal point like 1.2, as well as the following code:

String digit = "d"; String point = "."; String regex1 = "\\d+\\.\\d+"; String regex2 = Pattern.quote(digit + "+" + point + digit + "+");  Pattern numbers1 = Pattern.compile(regex1); Pattern numbers2 = Pattern.compile(regex2);  System.out.println("Regex 1: " + regex1);  if (numbers1.matcher("1.2").matches()) {     System.out.println("\tMatch"); } else {     System.out.println("\tNo match"); }  System.out.println("Regex 2: " + regex2);  if (numbers2.matcher("1.2").matches()) {     System.out.println("\tMatch"); } else {     System.out.println("\tNo match"); } 

Not surprisingly, the output produced by the above code is:

Regex 1: \d+\.\d+     Match Regex 2: \Qd+.d+\E     No match 

That is, regex1 matches 1.2 but regex2 (which is "dynamically" built) does not (instead, it matches the literal string d+.d+).

So, is there a method that would automatically escape each regex meta-character?

If there were, let's say, a static escape() method in java.util.regex.Pattern, the output of

Pattern.escape('.') 

would be the string "\.", but

Pattern.escape(',') 

should just produce ",", since it is not a meta-character. Similarly,

Pattern.escape('d') 

could produce "\d", since 'd' is used to denote digits (although escaping may not make sense in this case, as 'd' could mean literal 'd', which wouldn't be misunderstood by the regex interpeter to be something else, as would be the case with '.').

like image 665
PNS Avatar asked May 19 '12 10:05

PNS


People also ask

How do you escape special characters in regex Java?

To escape a metacharacter you use the Java regular expression escape character - the backslash character. Escaping a character means preceding it with the backslash character. For instance, like this: \.

What does \\ s+ in Java mean?

The Java regex pattern \\s+ is used to match multiple whitespace characters when applying a regex search to your specified value. The pattern is a modified version of \\s which is used to match a single whitespace character.


1 Answers

Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?

If you are looking for a way to create constants that you can use in your regex patterns, then just prepending them with "\\" should work but there is no nice Pattern.escape('.') function to help with this.

So if you are trying to match "\\d" (the string \d instead of a decimal character) then you would do:

// this will match on \d as opposed to a decimal character String matchBackslashD = "\\\\d"; // as opposed to String matchDecimalDigit = "\\d"; 

The 4 slashes in the Java string turn into 2 slashes in the regex pattern. 2 backslashes in a regex pattern matches the backslash itself. Prepending any special character with backslash turns it into a normal character instead of a special one.

matchPeriod = "\\."; matchPlus = "\\+"; matchParens = "\\(\\)"; ...  

In your post you use the Pattern.quote(string) method. This method wraps your pattern between "\\Q" and "\\E" so you can match a string even if it happens to have a special regex character in it (+, ., \\d, etc.)

like image 118
Gray Avatar answered Oct 18 '22 15:10

Gray