Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?
This would be very handy in dynamically building a regular expression, without having to manually escape each individual character.
For example, consider a simple regex like \d+\.\d+
that matches numbers with a decimal point like 1.2
, as well as the following code:
String digit = "d"; String point = "."; String regex1 = "\\d+\\.\\d+"; String regex2 = Pattern.quote(digit + "+" + point + digit + "+"); Pattern numbers1 = Pattern.compile(regex1); Pattern numbers2 = Pattern.compile(regex2); System.out.println("Regex 1: " + regex1); if (numbers1.matcher("1.2").matches()) { System.out.println("\tMatch"); } else { System.out.println("\tNo match"); } System.out.println("Regex 2: " + regex2); if (numbers2.matcher("1.2").matches()) { System.out.println("\tMatch"); } else { System.out.println("\tNo match"); }
Not surprisingly, the output produced by the above code is:
Regex 1: \d+\.\d+ Match Regex 2: \Qd+.d+\E No match
That is, regex1
matches 1.2
but regex2
(which is "dynamically" built) does not (instead, it matches the literal string d+.d+
).
So, is there a method that would automatically escape each regex meta-character?
If there were, let's say, a static escape()
method in java.util.regex.Pattern
, the output of
Pattern.escape('.')
would be the string "\."
, but
Pattern.escape(',')
should just produce ","
, since it is not a meta-character. Similarly,
Pattern.escape('d')
could produce "\d"
, since 'd'
is used to denote digits (although escaping may not make sense in this case, as 'd'
could mean literal 'd'
, which wouldn't be misunderstood by the regex interpeter to be something else, as would be the case with '.'
).
To escape a metacharacter you use the Java regular expression escape character - the backslash character. Escaping a character means preceding it with the backslash character. For instance, like this: \.
The Java regex pattern \\s+ is used to match multiple whitespace characters when applying a regex search to your specified value. The pattern is a modified version of \\s which is used to match a single whitespace character.
Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?
If you are looking for a way to create constants that you can use in your regex patterns, then just prepending them with "\\"
should work but there is no nice Pattern.escape('.')
function to help with this.
So if you are trying to match "\\d"
(the string \d
instead of a decimal character) then you would do:
// this will match on \d as opposed to a decimal character String matchBackslashD = "\\\\d"; // as opposed to String matchDecimalDigit = "\\d";
The 4 slashes in the Java string turn into 2 slashes in the regex pattern. 2 backslashes in a regex pattern matches the backslash itself. Prepending any special character with backslash turns it into a normal character instead of a special one.
matchPeriod = "\\."; matchPlus = "\\+"; matchParens = "\\(\\)"; ...
In your post you use the Pattern.quote(string)
method. This method wraps your pattern between "\\Q"
and "\\E"
so you can match a string even if it happens to have a special regex character in it (+
, .
, \\d
, etc.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With