Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Printing out unicode from Java code issue in windows console

I have got a problem with printing out a unicode symbol in the windows console.

Here's the java code that prints out the unicode symbol value;

System.out.print("\u22A2 ");

The problem doesn't exist when I run the program in Eclipse with encoding settings as UTF-8, however when it comes to windows console the symbol gets replaced by a question mark.

The following was done to try overcome this problem, with no success;

  • Change the font of windows console to Lucida Console.

  • Every time I run windows console I will change the encoding settings, i.e. with the use of chcp 65001

An extra step I've tried a few times was running the java file with an argument, i.e. java -Dfile.encoding=UTF-8 Filter (where "Filter" is name of the class)

like image 329
Adrian Avatar asked Dec 04 '13 21:12

Adrian


People also ask

How do you escape unicode characters in Java?

According to section 3.3 of the Java Language Specification (JLS) a unicode escape consists of a backslash character (\) followed by one or more 'u' characters and four hexadecimal digits.

Can you use unicode in Java?

Unicode sequences can be used everywhere in Java code. As long as it contains Unicode characters, it can be used as an identifier. You may use Unicode to convey comments, ids, character content, and string literals, as well as other information. However, note that they are interpreted by the compiler early.

What is unicode format in Java?

Unicode is a 16-bit character encoding system. The lowest value is \u0000 and the highest value is \uFFFF. UTF-8 is a variable width character encoding. UTF-8 has the ability to be as condensed as ASCII but can also contain any Unicode characters with some increase in the size of the file.

How is unicode calculated in Java?

We can determine the unicode category for a particular character by using the getType() method. It is a static method of Character class and it returns an integer value of char ch representing in unicode general category.


2 Answers

By default, the code-page using in the CMD of Windows is 437. You can test by run this command in the prompt:

C:\>chcp
Active code page: 437

And, this code-page prevent you from showing Unicode characters properly! You have to change code page to 65001 AND using -Dfile.encoding=UTF-8 for that purpose.

C:\>chcp 65001
Active code page: 65001
C:\>java -jar -Dfile.encoding=UTF-8 path/to/your/runnable/jar
like image 97
spider Avatar answered Oct 05 '22 23:10

spider


In additions to the steps you have taken, you also need a PrintStream/PrintWriter that encodes the printed characters to UTF-8.

Unfortunately, Java designers have chosen to open the standard streams with the so called "default" encoding, which is almost always unusable*) under Windows. Hence, using System.out and System.err naively will make your program output appear differently, depending on where you run it. This is straight against the goal: compile once, run anywhere.

*) It will be some non standard "code page" nobody except Microsoft recognizes on this planet. And AFAIK, if for example you have a German keyboard and a "German" OEM Windows and you want to have date and time in your home time zone, there is just no way to say: But I want UTF-8 input/output in my CMD window. This is one reason why I have my dual Ubuntu booted most of the time, where it goes without saying that the terminal does UTF-8.

The following usually works for me in JDK7:

public static PrintWriter stdout = new PrintWriter(
    new OutputStreamWriter(System.out, StandardCharsets.UTF_8),
    true);

For ancient Java versions, I replace StandardCharsets.UTF_8 by Charset.forName("UTF-8")

like image 39
Ingo Avatar answered Oct 06 '22 00:10

Ingo