Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange behavior of String's matches() method

Tags:

java

string

regex

I encountered an interesting issue about String's matches(RegExp) method.

assertTrue("33CCFF".matches("[0-9A-Za-z]{6}"));
assertTrue("CC33FF".matches("[0-9A-Za-z]{6}"));
assertTrue("CC3355".matches("[0-9A-Za-z]{6}"));
assertTrue("CC9955".matches("[0-9A-Za-z]{6}"));
assertTrue("CC3366".matches("[0-9A-Za-z]{6}"));
assertTrue("CC3965".matches("[0-9A-Za-z]{6}"));
assertTrue("CC1961".matches("[0-9A-Za-z]{6}"));
assertTrue("CC9999".matches("[0-9A-Za-z]{6}"));
assertTrue("СС3966".matches("[0-9A-Za-z]{6}")); // failing
assertTrue("СС9965".matches("[0-9A-Za-z]{6}")); // failing
assertTrue("СС9966".matches("[0-9A-Za-z]{6}")); // failing

The last 3 assertion is failing unexpectedly. I couldn't find any reasons why this weird behavior is happening. Do you have the same issue? Do you have any ideas?

By the way, in case of being asked, my java version is the following.

java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
like image 342
lemiorhan Avatar asked Oct 05 '11 16:10

lemiorhan


2 Answers

The last three don't actually start with an ASCII "C" character. They start with a non-ASCII character which looks like "C". That doesn't match anything in the [0-9A-Za-z] set, hence the pattern fails.

(I found this out by copying and pasting the code into a text editor which doesn't handle non-ASCII characters terribly well - they came out as "?".)

like image 149
Jon Skeet Avatar answered Oct 18 '22 20:10

Jon Skeet


your "СС3966" (I'm cutting and pasting) are getting flagged as non UTF-8, which is why reg-ex isn't matching them. When I change your text and just type it myself it works as expected. Not sure where you copied these values from, but that's your problem

like image 24
Yevgeny Simkin Avatar answered Oct 18 '22 21:10

Yevgeny Simkin