Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java, regular expression, need to escape backslash in regex

Tags:

java

regex

With reference to below question - String.replaceAll single backslashes with double backslashes

I wrote a test program, and I found that the result is true in both cases, whether I escape the backslash or not. This may be because - \t is a recognized Java String escape sequence. (Try \s and it would complain). - \t is taken as literal tab in the regex. I am somewhat unsure of the reasons.

Is there any general guideline about escaping regex in Java. I think using two backslashes is the correct approach.

I would still like to know your opinions.

public class TestDeleteMe {

  public static void main(String args[]) {
    System.out.println(System.currentTimeMillis());

    String str1 = "a    b"; //tab between a and b 

    //pattern - a and b with any number of spaces or tabs between 
    System.out.println("matches = " + str1.matches("^a[ \\t]*b$")); 
    System.out.println("matches = " + str1.matches("^a[ \t]*b$")); 
  }
}
like image 991
RuntimeException Avatar asked Feb 02 '12 13:02

RuntimeException


People also ask

How do I get out of backslash in regex Java?

To do so, you escape the backslash resulting in \\s . In short, you always need to escape character classes for RegEx patterns twice. If you want to match a backslash, the correct pattern is \\\\ because the Java compiler will make it \\ which the Pattern compiler will recognize as the escaped backslash character.

Do I need to escape slash in regex?

A slash symbol '/' is not a special character, but in JavaScript it is used to open and close the regexp: /... pattern.../ , so we should escape it too.

What should I escape in Java regex?

To escape a metacharacter you use the Java regular expression escape character - the backslash character. Escaping a character means preceding it with the backslash character. For instance, like this: \.


1 Answers

There are two interpretations of escape sequences going on: first by the Java compiler, and then by the regexp engine. When Java compiler sees two slashes, it replaces them with a single slash. When there is t following a slash, Java replaces it with a tab; when there is a t following a double-slash, Java leaves it alone. However, because two slashes have been replaced by a single slash, regexp engine sees \t, and interprets it as a tab.

I think that it is cleaner to let the regexp interpret \t as a tab (i.e. write "\\t" in Java) because it lets you see the expression in its intended form during debugging, logging, etc. If you convert Pattern with \t to string, you will see a tab character in the middle of your regular expression, and may confuse it for other whitespace. Patterns with \\t do not have this problem: they will show you a \t with a single slash, telling you exactly the kind of whitespace that they match.

like image 123
Sergey Kalinichenko Avatar answered Oct 24 '22 09:10

Sergey Kalinichenko