Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More unicode characters in windows console than expected

I want to print russian and german characters in windows console. So I wrote a small test program to understand how well it works:

PrintStream ps = new PrintStream(System.out, false, "UTF-8");
ps.println("öäüß гджщ");

Then I started cmd.exe, changed its font to Lucida Console, which supports Unicode, changed code page to Unicode with "chcp 65001" and executed my program.

The german and russian characters were printed but there was a little more text than I expected (underlined with red): enter image description here

But the text is printed correctly in the Eclipse console. Is there a way to print it correctly in windows console? I use Windows 7.

I've just solved the problem with JNI, but it is still interesting whether it is doable with pure java.

like image 999
ka3ak Avatar asked Dec 06 '12 12:12

ka3ak


People also ask

How to type Unicode characters on Windows?

Inserting Unicode characters To insert a Unicode character, type the character code, press ALT, and then press X. For example, to type a dollar symbol ($), type 0024, press ALT, and then press X.

Does Windows terminal support Unicode?

Get the new Windows Terminal. It has full support for Unicode and UTF-8.

Does Unicode use hexadecimal?

Unicode characters are distinguished by code points, which are conventionally represented by "U+" followed by four, five or six hexadecimal digits, for example U+00AE or U+1D310.


1 Answers

Every time you open or write a file, a certain encoding will be applied. But sometimes we forget that also our IDE (Eclipse in your case) has an encoding.

When you are typing a certain text between quotes, it is displayed and typed in a certain encoding, the encoding of your IDE. Your assumption is that the encoding of your output stream (UTF-8) will also guarantee that the text is displayed with that specific encoding. However, I think also here again the encoding of your IDE is applied.

I would propose to double check your encoding of eclipse. Perhaps this can solve your problem. Certainly worth a try, isn't it ? :)

For a global encoding setting add the following code to the eclipse.ini file

-Dfile.encoding=UTF-8 

EDIT:

I would just like to add the following. I performed the following steps as an experiment.

  1. I opened Notepad++ and created a new file
  2. I modified the encoding setting to UTF-8
  3. I copied your Russian text and pasted it in my new text file and saved it.
  4. Next I opened my windows console ("cmd")
  5. I executed the "chcp 65001" command.
  6. Next I printed the content of the file in my console: "type file.txt"
  7. Everything shows correctly.

This does not confirm much, but it does confirm the fact that DOS can do the job if the content is foreseen in the right encoding.

EDIT2:

@ka3ak It's been over 2 years, but while reading a book about Java I/O I stumbled upon the following.

System.console().printf(...) has better support for special characters than the System.out.println(...) method.

Since the PrintStream just wraps around the System.out stream, I guess you have the same limitations. I am wondering if this could have solved the problem. If it still matters, please give it a try. :)

Other posts on stackoverflow report similar things: console.writeline and System.out.println

like image 65
bvdb Avatar answered Sep 23 '22 04:09

bvdb