Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Greek String doesn't match regex when read from keyboard

Tags:

java

regex

public static void main(String[] args) throws IOException {
   String str1 = "ΔΞ123456";
   System.out.println(str1+"-"+str1.matches("^\\p{InGreek}{2}\\d{6}")); //ΔΞ123456-true

   BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
   String str2 = br.readLine(); //ΔΞ123456 same as str1.
   System.out.println(str2+"-"+str2.matches("^\\p{InGreek}{2}\\d{6}")); //Δ�123456-false

   System.out.println(str1.equals(str2)); //false
}

The same String doesn't match regex when read from keyboard.
What causes this problem, and how can we solve this?
Thanks in advance.

EDIT: I used System.console() for input and output.

public static void main(String[] args) throws IOException {
        PrintWriter pr = System.console().writer();

        String str1 = "ΔΞ123456";
        pr.println(str1+"-"+str1.matches("^\\p{InGreek}{2}\\d{6}")+"-"+str1.length());

        String str2 = System.console().readLine();
        pr.println(str2+"-"+str2.matches("^\\p{InGreek}{2}\\d{6}")+"-"+str2.length());

        pr.println("str1.equals(str2)="+str1.equals(str2));
}

Output:

ΔΞ123456-true-8
ΔΞ123456
ΔΞ123456-true-8
str1.equals(str2)=true

like image 697
athspk Avatar asked Jan 02 '11 15:01

athspk


People also ask

How do I match a character in regex?

Match any specific character in a setUse square brackets [] to match any characters in a set. Use \w to match any single alphanumeric character: 0-9 , a-z , A-Z , and _ (underscore). Use \d to match any single digit. Use \s to match any single whitespace character.

How match in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Which of the following methods returns true if a regular expression matches a string?

match() method is used to return a string which is matched with the regular expression. The re.search() method returns an object of the match when the pattern is found in a string or text.


2 Answers

There are multiple places where transcoding errors can take place here.

  1. Ensure that your class is being compiled correctly (unlikely to be an issue in an IDE):
    • Ensure that the compiler is using the same encoding as your editor (i.e. if you save as UTF-8, set your compiler to use that encoding)
    • Or switch to escaping to the ASCII subset that most encodings are a superset of (i.e. change the string literal to "\u0394\u039e123456")
  2. Ensure you are reading input using the correct encoding:
    • Use the Console to read input - this class will detect the console encoding
    • Or configure your Reader to use the correct encoding (probably windows-1253) or set the console to Java's default encoding

Note that System.console() returns null in an IDE, but there are things you can do about that.

like image 52
McDowell Avatar answered Oct 17 '22 06:10

McDowell


If you use Windows, it may be caused by the fact that console character encoding ("OEM code page") is not the same as a system encoding ("ANSI code page").

InputStreamReader without explicit encoding parameter assumes input data to be in the system default encoding, therefore characters read from the console are decoded incorrectly.

In order to correctly read non-us-ascii characters in Windows console you need to specify console encoding explicitly when constructing InputStreamReader (required codepage number can be found by executing mode con cp in the command line):

BufferedReader br = new BufferedReader(
    new InputStreamReader(System.in, "CP737")); 

The same problem applies to the output, you need to construct PrintWriter with proper encoding:

PrintWriter out = new PrintWrtier(new OutputStreamWriter(System.out, "CP737"));

Note that since Java 1.6 you can avoid these workarounds by using Console object obtained from System.console(). It provides Reader and Writer with correctly configured encoding as well as some utility methods.

However, System.console() returns null when streams are redirected (for example, when running from IDE). A workaround for this problem can be found in McDowell's answer.

See also:

  • Code page
like image 40
axtavt Avatar answered Oct 17 '22 08:10

axtavt