Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is wrong with matcher.find()?

Tags:

java

regex

String s = "1.01";
Matcher matcher = Pattern.compile("[+-/\\*\\^\\%]").matcher(s);
if (matcher.find()) {
    System.out.println(matcher.group());
}

Input string is "1.01" and output is ".". I can't understand why matcher.find() returns true, there are no symbols like "+", "-", "*", "^", "%" in input string. Why did it happen?

like image 306
igor.tsutsurupa Avatar asked May 29 '13 17:05

igor.tsutsurupa


2 Answers

The dash in any other position than the first or last inside a character class denotes a character range, just like [a-z] matches every lowercase letter from a to z, but [-az] only matches the dash, the a and the z. If you look at http://www.asciitable.com/, you'll see that [+-/] will match any of +,-./

Also, you don't have to escape those symbols in a regex, especially not in a character class. As said before, your main problem is the position of the dash in the character class.

You can fix your regex from

"[+-/\\*\\^\\%]"

to

"[-+/\\*\\^\\%]"
  ^^

or without the unnecessary escaping:

"[-+/*^%]"
like image 122
jlordo Avatar answered Oct 16 '22 05:10

jlordo


I'm pretty sure you have to escape -. - is used as a range symbol in character classes like in [0-9]. The - needs to be escaped if you want to find examples of the dash.

If you reorder the symbols inside, you can get away with the entire pattern without any escapes. [-+*^%] should work and is a bit easier to read.

like image 28
Walls Avatar answered Oct 16 '22 07:10

Walls