Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java, UTF-8, and Windows console

We try to use Java and UTF-8 on Windows. The application writes logs on the console, and we would like to use UTF-8 for the logs as our application has internationalized logs.

It is possible to configure the JVM so it generates UTF-8, using -Dfile.encoding=UTF-8 as arguments to the JVM. It works fine, but the output on a Windows console is garbled.

Then, we can set the code page of the console to 65001 (chcp 65001), but in this case, the .bat files do not work. This means that when we try to launch our application through our script (named start.bat), absolutely nothing happens. The command simple returns:

C:\Application> chcp 65001
Activated code page: 65001
C:\Application> start.bat

C:\Application>

But without chcp 65001, there is no problem, and the application can be launched.

Any hints about that?

like image 900
tofcoder Avatar asked Sep 10 '08 18:09

tofcoder


People also ask

Can Windows read UTF-8?

On Windows, the native encoding cannot be UTF-8 nor any other that could represent all Unicode characters. Windows sometimes replaces characters by similarly looking representable ones (“best-fit”), which often works well but sometimes has surprising results, e.g. alpha character becomes letter a.

Does Windows terminal support Unicode?

Windows Terminal includes multiple tabs, panes, customizable shortcuts, support for Unicode and UTF-8 characters, and custom themes and styles. The terminal can support PowerShell, cmd, WSL, and other command-line tools.

Is Java UTF-8 or 16?

The native character encoding of the Java programming language is UTF-16. A charset in the Java platform therefore defines a mapping between sequences of sixteen-bit UTF-16 code units (that is, sequences of chars) and sequences of bytes.


1 Answers

Try chcp 65001 && start.bat

The chcp command changes the code page, and 65001 is the Win32 code page identifier for UTF-8 under Windows 7 and up. A code page, or character encoding, specifies how to convert a Unicode code point to a sequence of bytes or back again.

like image 189
erickson Avatar answered Sep 22 '22 16:09

erickson