Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Canonical equivalence in Pattern

Tags:

java

regex

I am referring to the test harness listed here http://docs.oracle.com/javase/tutorial/essential/regex/test_harness.html

The only change I made to the class is that the pattern is created as below:

Pattern pattern = 
        Pattern.compile(console.readLine("%nEnter your regex(Pattern.CANON_EQ set): "),Pattern.CANON_EQ);

As the tutorial at http://docs.oracle.com/javase/tutorial/essential/regex/pattern.html suggests I put in the pattern or regex as a\u030A and string to match as \u00E5 but it ends on a No Match Found. I saw both the strings are a small case 'a' with a ring on top.

Have I not understood the use case correctly?

like image 797
Asif Avatar asked Apr 22 '12 05:04

Asif


1 Answers

The behavior you're seeing has nothing to do with the Pattern.CANON_EQ flag.

Input read from the console is not the same as a Java string literal. When the user (presumably you, testing out this flag) types \u00E5 into the console, the resultant string read by console.readLine is equivalent to "\\u00E5", not "å". See for yourself: http://ideone.com/lF7D1

As for Pattern.CANON_EQ, it behaves exactly as described:

Pattern withCE = Pattern.compile("^a\u030A$",Pattern.CANON_EQ);
Pattern withoutCE = Pattern.compile("^a\u030A$");
String input = "\u00E5";

System.out.println("Matches with canon eq: "
    + withCE.matcher(input).matches()); // true
System.out.println("Matches without canon eq: "
    + withoutCE.matcher(input).matches()); // false

http://ideone.com/nEV1V

like image 95
Matt Ball Avatar answered Nov 12 '22 05:11

Matt Ball