Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you specify a Java file.encoding value consistent with the underlying Windows code page?

I have a Java application that receives data over a socket using an InputStreamReader. It reports "Cp1252" from its getEncoding method:

/* java.net. */ Socket Sock = ...;
InputStreamReader is = new InputStreamReader(Sock.getInputStream());
System.out.println("Character encoding = " + is.getEncoding());
// Prints "Character encoding = Cp1252"

That doesn't necessarily match what the system reports as its code page. For example:

C:\>chcp
Active code page: 850

The application may receive byte 0x81, which in code page 850 represents the character ü. The program interprets that byte with code page 1252, which doesn't define any character at that value, so I get a question mark instead.

I was able to work around this problem for one customer who used code page 850 by adding another command-line option in the batch file that launches the application:

java.exe -Dfile.encoding=Cp850 ...

But not all my customers use code page 850, of course. How can I get Java to use a code page that's compatible with the underlying Windows system? My preference would be something I could just put in the batch file, leaving the Java code untouched:

ENC=...
java.exe -Dfile.encoding=%ENC% ...
like image 382
Rob Kennedy Avatar asked Aug 26 '09 19:08

Rob Kennedy


1 Answers

The default encoding used by cmd.exe is Cp850 (or whatever "OEM" CP is native to the OS); the system encoding is Cp1252 (or whatever "ANSI" CP is native to the OS). Gory details here. One way to discover the console encoding would be to do it via native code (see GetConsoleOutputCP for current console encoding; see GetACP for default "ANSI" encoding; etc.).

Altering the encoding via the -D switch is going to affect all your default encoding mechanisms, including redirected stdout/stdin/stderr. It is not an ideal solution.

I came up with this WSH script that can set the console to the system ANSI codepage, but haven't figured out how to programmatically switch to a TrueType font.

'file:  setacp.vbs
'usage: cscript /Nologo setacp.vbs
Set objShell = CreateObject("WScript.Shell")
'replace ACP (ANSI) with OEMCP for default console CP
cp = objShell.RegRead("HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001" &_
                              "\Control\Nls\CodePage\ACP")
WScript.Echo "Switching console code page to " & cp
objShell.Exec "chcp.com " & cp

(This is my first WSH script, so it may be flawed - I'm not familiar with registry read permissions.)

Using a TrueType font is another requirement for using ANSI/Unicode with cmd.exe. I'm going to look at a programmatic switch to a better font when time permits.

like image 64
McDowell Avatar answered Oct 06 '22 00:10

McDowell