Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Escape path separator in a regular expression

I need to write a regular expression that finds javascript files that match

<anypath><slash>js<slash><anything>.js

For example, it should work for both :

  • c:\mysite\js\common.js (Windows)
  • /var/www/mysite/js/common.js (UNIX)

The problem is that the file separator in Windows is not being properly escaped :

pattern = Pattern.compile(
     "^(.+?)" + 
     File.separator +
     "js" +
     File.separator +
     "(.+?).js$" );

Throwing

java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence

Is there any way to use a common regular expression that works in both Windows and UNIX systems ?

like image 219
Guido Avatar asked Oct 28 '08 10:10

Guido


People also ask

What does \\ mean in Java regex?

The backslash \ is an escape character in Java Strings. That means backslash has a predefined meaning in Java. You have to use double backslash \\ to define a single backslash. If you want to define \w , then you must be using \\w in your regex.

Which character is used as a path separator?

pathSeparator would be ; . File. separator is either / or \ that is used to split up the path to a specific file. For example on Windows it is \ or C:\Documents\Test.

How do you escape a regular expression in Java?

To escape a metacharacter you use the Java regular expression escape character - the backslash character. Escaping a character means preceding it with the backslash character. For instance, like this: \.


2 Answers

Does Pattern.quote(File.separator) do the trick?

EDIT: This is available as of Java 1.5 or later. For 1.4, you need to simply escape the file separator char:

"\\" + File.separator

Escaping punctuation characters will not break anything, but escaping letters or numbers unconditionally will either change them to their special meaning or lead to a PatternSyntaxException. (Thanks Alan M for pointing this out in the comments!)

like image 77
Tomalak Avatar answered Oct 13 '22 18:10

Tomalak


Is there any way to use a common regular expression that works in both Windows and UNIX systems ?

Yes, just use a regex that matches both kinds of separator.

pattern = Pattern.compile(
    "^(.+?)" + 
    "[/\\\\]" +
    "js" +
    "[/\\\\]" +
    "(.+?)\\.js$" );

It's safe because neither Windows nor Unix permits those characters in a file or directory name.

like image 22
Alan Moore Avatar answered Oct 13 '22 17:10

Alan Moore